Saturday, September 3, 2011

Warehouse-Scale Computing: Entering the Teenage Decade

Google's Luiz Andre Barroso gave a plenary talk at 2011 Federated Computing Research Conference. His video presentation, Warehouse-Scale Computing: Entering the Teenage Decade, is available here.

According to Barroso, Google was simply a software engineering company 10 years ago. But now, Google is becoming more than that, having fairly sophisticated facilities, not to mention their diverging business areas. Their recent datacenter building design shows that a collection of separated machines is not the only one to describe their identity any more.

I want to summarize (or over-simplify) Google's struggle for the last 10+ years as "better service with less cost". For better service they have needed more computation, more bandwidth, more fault tolerance, and less latency. For less cost they have tried to achieve optimal hardware configuration and better power efficiency. Luiz makes some analogy between the characteristics of teenage (moody, immature, impatient, retro, etc.) and their maturing platforms to introduce the current challenges of modern datacenters.

Among many examples, the emphasis on I/O is worth noting. Barroso stress the latency issue especially, because (tail) latency is very important in large-scale, massively-parallel environments such as Google. While many hardware and software technologies have been borrowed from commodity market to get a free ride on the economy of scale, they have often turned out misbehaving, especially in terms of latency. He gives two examples, flash and TCP/IP that were developed for the millisecond-worlds, which need to be addressed to adapt to the microsecond-critical world.

How will our teenager turn out? Although the future itself is hard to predict, he gives clear milestones in the journey, such as how to handle temporary power peak to utilize power, how to achieve high-speed bisection bandwidth, how to close the gap between DRAM and disks.

No comments:

Post a Comment