Wednesday, November 28, 2007

Efficiency of Supercomputers - What a terrible misnomer!

I listened at the last Supercomputing conference to a talk by Ray Orbach, the director of the Department of Energy Office of Science. He was speaking about the wake-up call that the Earth Simulator was to him soon after he came into office. To quote: "What shocked scientists and government officials was the Earth Simulator's efficiency: It had run at more than 85 percent of its 40-teraflops theoretical peak speed - a far greater percentage than U.S. supercomputers." Now, from the context, it is quite clear that Ray Orbach thinks that high efficiency is good: scientists and government officials were shocked that the ES was so much more efficient than US systems. In truth they should have been shocked by the terrible inefficient design of the ES that is reflected by this high "efficiency" number, and be glad that US platforms are so much better designed than the ES.

Why?

Suppose that Ray Orbach had learned that the Japanese Office of Science (if there is such) had succeeded in utilizing the phones in its offices 85% of the time, during working hours; no telephone sat idle for more than 9 minutes during any working hour. Would he be shocked by the efficiency of the Japanese telephone system? Probably not. If telephones are continuously used, then, indeed, the usage of telephones is highly efficient. But this is most likely causing other resources to be used very inefficiently: People making calls to the Office of Science get a busy tone almost all the time; people from the office that want to make calls may have to wait hours for a free phone. While phones are used more efficiently, people are used much less efficiently; the overall productivity of the organization decreases. since people are much more expensive than telephones. Moral of the story: focus on using efficiently the expensive resources of your organization, even if this requires using less efficiently cheap resources.

Are floating point untis (FPUs) an expensive component in a modern CPU?

The answer is emphatically no. The FPUs consume only a few percents of the chip area or of the chip power budget. The bulk of the area and of the power budget are consumed by caches and by buses - by the devices that store and transfer information. Suppose, hypothetically, that by replacing some of the cache with an additional FPU one could increase the performance of a chip by 10%. It is a no-brainer that the change is an improvement. However, if before the change the chip has an "efficiency" of 20% it now has an "efficiency" of 11%: while performance increased by 10%, peak performance increased by 100%, and "efficiency" decreased significantly. In Ray Orbach's world, the new system is much worse than the old system as it is less "efficient".

Ray Orbach should care about total factor productivity: how many flops (or better -- how many solved problems) he gets per dollar, not about the utilization of the floating point units. The main component of a chip (in area or power) is the memory subsystem: caches and buses. The main concern of a CPU designer should be that the caches and buses are utilized efficiently; if adding FPUs lead to a higher utilization of caches and buses, then the overall system efficiency has improved, even though FPU utilization is lower.