This conference paper was published by the IEEE Computer Society Press in the Proceedings of the 2015 IEEE International Congress on Big Data, June/July 2015, pages 526–534.
The selection of a particular NoSQL database for use in a big data system imposes a specific distributed software architecture and data model, making the technology selection difficult to defer and expensive to change. This paper reports on the selection of a NoSQL database for use in an Electronic Healthcare Record system being developed by a large healthcare provider. We performed application-specific prototyping and measurement to identify NoSQL products that fit data model and query use cases, and meet performance requirements. We found that database throughput varied by a factor of 10, read operation latency varied by a factor of 5, and write latency by a factor of 4 (with the highest throughput product delivering the highest latency). We also found that the throughput for workloads using strong consistency was 10-25% lower than workloads using eventual consistency. We conclude by reflecting on some of the fundamental difficulties of performing detailed technical evaluations of NoSQL databases specifically, and big data systems in general, that have become apparent during our study.