I made a presentation on RAMCloud in my class. The slides are available here.
RAMCloud is an actively ongoing project in Stanford University, aiming a new storage layer for data centers. RAMCloud has unique characteristics compared to previous data center storage systems, and the sharpest difference is that RAMCloud stores all data in DRAM, but data is permanent unlike memcached.
This all-data-in-DRAM feature leads to interesting consequences, such as:
- RAMCloud has orders of magnitude higher performance (10us latency, 1M operations/s) than other disk-based storages.
- RAMCloud should be free (at least in theory) from the tail latency problem, which is mainly caused by disk seek latency.
- There is no concept of replica in RAMCloud; every data item is served by a single machine. In other storage systems, replica is used for two main reasons: load balancing and fault tolerance. Since RAMCloud has much higher per-node performance, load balancing is not a critical issue. In terms of fault tolerance, RAMCloud has a cool feature called "Fast Recovery", which recovers the data of dead node in 1-2 seconds. The fast recovery process was covered in the paper at SOSP 2011.
- Even if the data model supported by RAMCloud is a simple key-value storage, its low latency makes more complex (e.g., relations) data structures feasible on top of RAMCloud.
RAMCloud still has the hurdles to overcome as follows:
- Currently, the high throughput and low latency of their prototype heavily depends on Infiniband (kind of free riding). It is not sure if it can scale to data center scale.
- Tail latency problem comes also from overloaded network/server, not just from disks.
- Unlike other storage systems from industry (where their products came from their demands), it is somewhat uncertain that what would be the killer application of RAMCloud.
No comments:
Post a Comment