This paper was published by the ACM in the Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems in February 2015.
The choice of a particular NoSQL database imposes a specific distributed software architecture and data model, and is a major determinant of the overall system throughput. NoSQL database performance is in turn strongly influenced by how well the data model and query capabilities fit the application use cases, and so system-specific testing and characterization is required. This paper presents a method and the results of a study that selected among three NoSQL databases for a large, distributed healthcare organization. While the method and study considered consistency, availability, and partition tolerance (CAP) tradeoffs, and other quality attributes that influence the selection decision, this paper reports on the performance evaluation method and results. In our testing, a typical workload and configuration produced throughput that varied from 225 to 3200 operations per second between database products, while read operation latency varied by a factor of 5 and write latency by a factor of 4 (with the highest throughput product delivering the highest latency). We also found that achieving strong consistency reduced throughput by 10-25% compared to eventual consistency.