|
Page 2 of 3
Cluster in a Box
Sun brought one of
their "clusters in a shipping container" solutions to the show out in the parking
lot. I didn't a chance to get into the shipping container but I peaked inside.
It's a very cool cluster idea (again figuratively and literally). You bring it
in on an 18-wheel truck, plug in the network, the chilled water and power, and
bingo - a cluster. Since they are using chilled water it's a very green solution.
Rackable has a similar
solution.
Green500
SC07 was the launch point for the Green500.
It's a website devoted to listing the top500 most efficient systems in the world.
Dr. Wu-chun Feng, previously of Los Alamos and now at Virginia Tech has been a
champion of lower power systems and started the idea of the Green500 to promote
the idea of "Green Computing." The inaugural list which coincides with the Top500
list being announced, was announced at SC07. The top machine, actually the first
5 machines are IBM Blue Gene/P systems. The #1 system is at the Science and Technology
Facilities Council at the Daresbury Laboratory and achieved 357.23 MFLOPS per Watt.
The #6 system was a
Dell Poweredge 1950 system at the Stanford
University Biomedical Computational Facility and achieved 245.27 MFLOPS per watt
(this is the highest ranking cluster on the list). The lowest ranking machine was
ASCI Q, which is an old Alphaserver system, at Los Alamos. It achieved only 3.65
MFLOPS per Watt (ouch!).
This initial list is built from the November 2007 Top500 list. So it gives us, the
HPC community, a good baseline for starting a Green500 list. In the future I hope
that the list will expand to include smaller machines such as
Microwulf. I think
people will be surprised how power efficient smaller machines can be. Particularly
if they are diskless nodes.
Heterogeneous Computing
The second trend I see is somewhat opposite to Green Computing, but still has its merits if it works for your application. Since we are fundamentally limited by the
same CPUs, hard drives, interconnects, and memory, the power consumption of the
core systems is about the same. Also, CPUs clock
speeds are slowly increasing, but no where near the previous rate. Many
people are looking additional types of hardware for ways to accelerate baseline performance.
This trend is often referred to as Heterogeneous Computing.
The current major contenders for Heterogeneous are,
- FPGA's (Field Programmable Gate Arrays)
- Clearspeed
- GPUs (Graphic Processing Units)
- Cell processor from IBM and Sony
All 4 are devoted to providing great leaps in processing capability, at a good
power/performance point, and hopefully, at a good price point. With exception of the Cell processor, however, none of these technologies are designed to operate as stand alone systems. i.e. They all need a host of some sort.
All of these technologies (and companies) are trying to provide increases
in computing power in different ways. I think I've said this before, but a couple
of years ago, my bet was placed on GPUs. The reason for my bet is
simple - commodity pricing. Commodity pricing brought down the big iron of HPC and
commodity pricing is doing wonders for the
electric car industry. So I think
commodity pricing can hep GPUs become the winner in the accelerator contest.
The other three technologies, FPGA's, Clearspeed, and Cell processors are either
very niche products, in the case of Clearspeed, or somewhat niche products as in
the case of Cell processors. At the highest end, Cell processors are sold in perhaps
the hundreds of thousands or low millions due to their use in the Sony PS3. But GPUs are sold by the millions
every year (last year Nvidia sold over
95 million GPUs). The crazy gamers out there who have to have the latest and greatest
fastest GPU(s) so they can enjoy their games, have been pushing the GPU market
really hard the last several years. Plus people now have multiple machines in their
homes - multiple computers and game consoles - all of which have GPUs in them. So
this means that GPUs have become
commodities. I can go into any computer store anywhere and find very fast graphics
cards. Heck, I can even go into Walmart and find them! (when you're in Walmart, you
have arrived). So Nvidia and ATI (AMD) can spread development costs across tens of
millions of GPUs, allowing them to sell the cards for a low price. God bless those
gamers.
The other technologies simply don't have this commodity market working for them.
This means they have to spread their development costs over a much, much smaller
number of products, which forces the prices way up. This is why I think that GPUs
will be the winner in this accelerator contest. Also, I'm not
alone in this belief.
I think everyone saw the
AMD announcement
about a double precision GPU card that does computations. The board has 2GB of
memory (the largest that I know of with GPUs), uses 150W of power (while it sounds
like a lot, it isn't too bad), and costs $2,000 (that's a bit out of the commodity
range). In addition, AMD is going to finally going to offer a programming kit for
the GPU. They will be offering a derivative of
Brook called Brook+.
Nvidia was showing their
Tesla
GPU computing product. I stopped by the booth and I was amazed. They were showing off a 1U box that had four Tesla's on-board
that provide well over 1 TFLOPS in performance. Here is a picture of one.
 Figure Two: Four Tesla's in a 1U box
Just behind the 1U box on the left hand corner you can see a Tesla card and you can
see the back of the card. Notice that there isn't a video connection :) You can
also connect multiple 1U boxes using what I think is a PCI-e connector. Here's a
picture of a Tesla cluster as well.
 Figure Three: A Tesla cluster
Looking at the bottom of the rack (it's a half-rack), you can see the row of fans
in the front of the Tesla 1U nodes. You will also see four 1U nodes in the rack.
I'm not sure what the other nodes are in the rack. The Tesla nodes are connected
via a PCI-e cable.
Nvidia has released a free tool for programming called
CUDA. It's available
for free and uses basic C commands with new data types. Basically CUDA is a
compiler that compiles GPU specific commands and spits out non-GPU code that
you can compile with whatever C compiler you want. I spoke with a couple of their
developers that I know very well. They say it's very easy to write code with
CUDA. These guys are very bright (actually extremely bright) so your mileage may
vary, but in general, I trust their opinions. There are even some rumors of some
kind of Fortran extensions for CUDA. So go out and get your G80 or better card
and start coding!!
|