Article Index

Cluster in a Box

Sun brought one of their "clusters in a shipping container" solutions to the show out in the parking lot. I didn't a chance to get into the shipping container but I peaked inside. It's a very cool cluster idea (again figuratively and literally). You bring it in on an 18-wheel truck, plug in the network, the chilled water and power, and bingo - a cluster. Since they are using chilled water it's a very green solution. Rackable has a similar solution.


SC07 was the launch point for the Green500. It's a website devoted to listing the top500 most efficient systems in the world. Dr. Wu-chun Feng, previously of Los Alamos and now at Virginia Tech has been a champion of lower power systems and started the idea of the Green500 to promote the idea of "Green Computing." The inaugural list which coincides with the Top500 list being announced, was announced at SC07. The top machine, actually the first 5 machines are IBM Blue Gene/P systems. The #1 system is at the Science and Technology Facilities Council at the Daresbury Laboratory and achieved 357.23 MFLOPS per Watt. The #6 system was a Dell Poweredge 1950 system at the Stanford University Biomedical Computational Facility and achieved 245.27 MFLOPS per watt (this is the highest ranking cluster on the list). The lowest ranking machine was ASCI Q, which is an old Alphaserver system, at Los Alamos. It achieved only 3.65 MFLOPS per Watt (ouch!).

This initial list is built from the November 2007 Top500 list. So it gives us, the HPC community, a good baseline for starting a Green500 list. In the future I hope that the list will expand to include smaller machines such as Microwulf. I think people will be surprised how power efficient smaller machines can be. Particularly if they are diskless nodes.

Heterogeneous Computing

The second trend I see is somewhat opposite to Green Computing, but still has its merits if it works for your application. Since we are fundamentally limited by the same CPUs, hard drives, interconnects, and memory, the power consumption of the core systems is about the same. Also, CPUs clock speeds are slowly increasing, but no where near the previous rate. Many people are looking additional types of hardware for ways to accelerate baseline performance. This trend is often referred to as Heterogeneous Computing.

The current major contenders for Heterogeneous are,

  • FPGA's (Field Programmable Gate Arrays)
  • Clearspeed
  • GPUs (Graphic Processing Units)
  • Cell processor from IBM and Sony

All 4 are devoted to providing great leaps in processing capability, at a good power/performance point, and hopefully, at a good price point. With exception of the Cell processor, however, none of these technologies are designed to operate as stand alone systems. i.e. They all need a host of some sort.

All of these technologies (and companies) are trying to provide increases in computing power in different ways. I think I've said this before, but a couple of years ago, my bet was placed on GPUs. The reason for my bet is simple - commodity pricing. Commodity pricing brought down the big iron of HPC and commodity pricing is doing wonders for the electric car industry. So I think commodity pricing can hep GPUs become the winner in the accelerator contest.

The other three technologies, FPGA's, Clearspeed, and Cell processors are either very niche products, in the case of Clearspeed, or somewhat niche products as in the case of Cell processors. At the highest end, Cell processors are sold in perhaps the hundreds of thousands or low millions due to their use in the Sony PS3. But GPUs are sold by the millions every year (last year Nvidia sold over 95 million GPUs). The crazy gamers out there who have to have the latest and greatest fastest GPU(s) so they can enjoy their games, have been pushing the GPU market really hard the last several years. Plus people now have multiple machines in their homes - multiple computers and game consoles - all of which have GPUs in them. So this means that GPUs have become commodities. I can go into any computer store anywhere and find very fast graphics cards. Heck, I can even go into Walmart and find them! (when you're in Walmart, you have arrived). So Nvidia and ATI (AMD) can spread development costs across tens of millions of GPUs, allowing them to sell the cards for a low price. God bless those gamers.

The other technologies simply don't have this commodity market working for them. This means they have to spread their development costs over a much, much smaller number of products, which forces the prices way up. This is why I think that GPUs will be the winner in this accelerator contest. Also, I'm not alone in this belief.

I think everyone saw the AMD announcement about a double precision GPU card that does computations. The board has 2GB of memory (the largest that I know of with GPUs), uses 150W of power (while it sounds like a lot, it isn't too bad), and costs $2,000 (that's a bit out of the commodity range). In addition, AMD is going to finally going to offer a programming kit for the GPU. They will be offering a derivative of Brook called Brook+.

Nvidia was showing their Tesla GPU computing product. I stopped by the booth and I was amazed. They were showing off a 1U box that had four Tesla's on-board that provide well over 1 TFLOPS in performance. Here is a picture of one.

Figure Two: Four Tesla's in a 1U box
Figure Two: Four Tesla's in a 1U box

Just behind the 1U box on the left hand corner you can see a Tesla card and you can see the back of the card. Notice that there isn't a video connection :) You can also connect multiple 1U boxes using what I think is a PCI-e connector. Here's a picture of a Tesla cluster as well.

Figure Three: A Tesla cluster
Figure Three: A Tesla cluster

Looking at the bottom of the rack (it's a half-rack), you can see the row of fans in the front of the Tesla 1U nodes. You will also see four 1U nodes in the rack. I'm not sure what the other nodes are in the rack. The Tesla nodes are connected via a PCI-e cable.

Nvidia has released a free tool for programming called CUDA. It's available for free and uses basic C commands with new data types. Basically CUDA is a compiler that compiles GPU specific commands and spits out non-GPU code that you can compile with whatever C compiler you want. I spoke with a couple of their developers that I know very well. They say it's very easy to write code with CUDA. These guys are very bright (actually extremely bright) so your mileage may vary, but in general, I trust their opinions. There are even some rumors of some kind of Fortran extensions for CUDA. So go out and get your G80 or better card and start coding!!

You have no rights to post comments


Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.