Monday, October 10, 2011

Dynamo: Amazon’s Highly Available Key-value Store


Dynamo is a storage system of Amazon.com that serves a highly-available key-value storage running on a cluster of thousands of commodity servers. Since it is mainly used for e-commerce applications, the strong emphasis on scalability makes traditional RDBMSs not feasible for their services (parallelizing write is virtually impossible). Rather, they chose the simple key-value approach (only with get/put operations) for scalability, while sacrificing some virtues of RDBMSs, such as schemas, transactions, and strong consistency. Their main goal is to achieve high availability, even if an entire datacenter fails.

As a key-value storage distributed on multiple nodes, Dynamo is based on a Chord-like, ring-based distributed hash table (DHT) algorithm. However, due to the characteristics of their applications, the key design principles differ from prior DHT systems: their data storage must be highly redundant; tail latency is important to meet the acceptable SLA; and it is run by a single administrative domain. Redundancy support introduces a tricky problem, consistency. When nodes leave out and get in, the system may have different versions of value data. This short-term inconsistency is fixed asynchronously fixed in Dynamo, which introduces the concept of eventual consistency.

Dynamo is closed-source, which is somewhat disappointing as open source software has been the key of success of Amazon.com. The open-source version of Dynamo has become Apache Cassandra later.

No comments:

Post a Comment