Friday, September 2, 2011

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines (2/2)

Chapter 3 - Hardware Building Blocks

This chapter shows why datacenters are full of commodity-off-the-shelf computers, not rack-sized, room-sized, nor building-sized super computers. The authors give an example with a large SMP server and a PC-class server and make a comparison of them. The most important factor is price/performance; how much work can you do with the same amount of money?

Of course, single-machine experiments are not enough to explain the case of datacenter computing. So they make a simple model to estimate how good the large computers and the small computers are in parallel workloads. They also explore the case of very low-end alternatives, such as Atom processors, but it is somewhat qualitative rather than quantitative (but they don't mention the 'hybrid' approach).

Unfortunately, it does not explain much about power efficiency, which is another important factor in terms of cost.

Chapter 4 - Datacenter Basics

Cloud computing is conceptually foggy but datacenters physically exist. With lots of lots of servers. The servers take their place. They should never fail. They convert power into heat, accidentally doing some computation as a side effect. So, what have you got to do?

This chapter explains how datacenters are designed in a physical manner. How to feed power to the server, how to cool the server, and how to make the whole things fault-tolerant are unveiled.

Chapter 7 - Dealing with Failures and Repairs

Web-based services on a datacenter should be always available, but individual components in the datacenter is prone to fail. This chapter begins with how hardware and software can mitigate the failure to keep the service available and explains the trade-off between hardware and software approaches.

The authors also categorize faults, in terms of both severity and causes. (Not) Surprisingly, human-related causes, such as configuration error or software bug, dominate faults in datacenters. This could be somewhat bad news since human-related issues tend to be not isolated within a device, while hardware-related issues are largely statistically independent.

Many statistics the authors give show failures are pervasive in datacenters. This chapter explains how to analyze/predict/repair/tolerate faults.

No comments:

Post a Comment