Power Usage and GigE Switches - Let's Talk

Published on Wednesday, 28 June 2006 05:22
Written by Jeff Layton
Hits: 5940

Power Usage and GigE switches - Sounds like Lunch to me!

The Beowulf mailing list provides detailed discussions about issues concerning Linux HPC clusters. In this article I review some postings to the Beowulf list on power usage (always a good topic) from two discussion threads on the beowulf list. I'll also review some discussions about 24-port GigE switches from 2004. Plus a bonus update to the age old question Where can I get a cheap GigE switch?

Max FLOPS/WATTS

Producing the most performance for a given power level is becoming an increasingly important issue. Given that Intel has currently stopped development of more powerful CPU cores because of the huge power consumption (the new Prescott dissipates about 103W of power), this question has taken on new meaning. On February 16 of 2004, Camm Maguire wanted to find the maximum flops (floating point operations per second) to watts ratio for a cluster. Joshua Baker-LePain was the first to reply. He thought the Dell Optiplex SX270 provided a good ratio. These boxes consist of an Intel P4 with Hyperthreading up to 3.2 GHz with on-board GigE, a laptop type hard drive and a simple 150 Watt power supply.

{mosgoogle right}

At this point, the discussion broke into two very interesting threads. The first thread began with a reply from Jim Lux, the "space" cluster expert. Jim pointed out that this very topic is very near and dear to his heart because he's designing power constrained systems for space based computations. Jim's first reply began by asking some very good questions. Was Camm interested in the power consumption at the wall plug per node? Or is it the power consumption for the CPU only? Should memory be included? How about networking power consumption? Jim pointed out that the networking part of the power equation is very interesting because you could have a number of low power nodes with a high-speed interconnect that itself uses a fair amount of power. In this case Jim thought it might be better to use a few high power nodes to save on the power for the networking. Jim also pointed out that you could forgo the networking switch and connect the nodes in a torus or cube fashion but this will cost some in CPU performance since the nodes have to route the data.

Andrew Carter responded to Jim's posting with a quick comment. He mentioned that something like the Via Eden/Nehemiah running at 1 GHz uses about 7W of power. He mentioned that the Mars Rovers are using about 5W of power to transmit back to Earth. Jim Lux responded that the power number was pretty accurate but it required a nice cyrogenically cooled receiver on Earth and hundreds of kilowatts to transmit from Earth to Mars. If makes you wonder that if we colonize planets and other bodies in space could we make an interplanetary Grid.

The second thread started with some postings from William Dieter who pointed out a cluster design tool at Aggregate.org that allows you to design based on power. It allows you to input cluster parameters such as memory, hard disk space, etc. and weight the resulting designs based on power consumption. The designs are based on a set of rules for individual parts. Bogdan Costescu responded that the rules could be somewhat simplistic in that reducing the power on the NIC, for example, would drive you to a NIC that increases the load on the CPU. William responded that the rules aren't perfect and there is still some development to be done (volunteers anyone?).

This type of discussion, power vs. performance, is going to become more common as processor power consumption increases dramatically.

Power Consumption for Opterons

The previous discussion made some points about power consumption. Another discussion centering on the Opteron also developed as well. As with many narrow topics, the discussion became much more general providing some interesting ideas and interesting comments.

On March 9, 2004, Trent Piepho asked about power consumption for dual Opterons measured with a "kill-a-watt" type device (you can plug the device into a wall outlet and then plug your node into the device to measure power usage, etc.). There were a number of replies to this posting with some very good numbers. Bill Broadley posted some numbers for a Sun V20z Sunfire (dual 2.2 Ghz Opterons) which had dual SCSI drives and 4 Gigs of memory. Idling the box was measured using 237-249 watts. When running a code called pstream which is related to the Stream benchmark, Bill measured 260-277 watts for one instance of pstream and 265-280 watts for two instances of pstream. The unsinkable Robert Brown posted some numbers for a dual Opteron machine he was testing (dual 242 Opteron) under a load average of 2. He measured 150 watts using his kill-a-watt device. He thought this was quite good and remarked that it was better than a dual Athlon that was just idling. Andrew Wang thought that both sets of results were very good and mentioned that an idling Itanium2 used about 120 watts just for the CPU alone. Mark Hahn also posted that his dual Opteron (dual 240's) used about 250 watts maximum while running two copies of Stream and one copy of Bonnie++. Robert Brown then posted that he tested his dual Opteron again and with a load factor of 3 he measured about 182 watts.

There was some discussion about Roberts numbers because they were much lower than either Bill's or Mark's. Jim Lux jumped in to say that the number sounded reasonable (Jim was taking a break from building his space based Death Star cluster). The 180 watts converts to about 140 watts DC which allows for about 100 watts total for the CPUs, 10 watts or so for the fans, and 30 watts for the board logic and RAM.

Trent Piepho also posted some numbers for various Intel Pentium III and Xeon machines under various types of loads. The dual 2.4 GHz Xeon box was a pretty hefty box with a total of 16 SATA drives, a single IDE drive, 1 GB of memory, CD-ROM, and 6 high-speed case fans (measured at 4.4 watts each). When running RAID and Bonnie++, the box was using about 534 watts (Ouch).


Good GigE Switches?

On March 5, 2004, Russell Nordquist posted to the Beowulf list looking for recommendations for good 24-port GigE switches. He asked for input but wanted to stay away from "low-end" vendors (pick your definition or "low-end" here). There were a number of recommendations, but the thread ended with a very nice tool for testing switches. However, let's start with the recommendations. Please remember that these are personal recommendations from individuals on the Beowulf mailing list. Their experiences don't reflect any endorsements from this website or me. As always, YMMV (Your Mileage May Vary), test your codes.

Update: There was a recent and interesting discussion in March of 2006. Look for Gigabit switch recommendations by Joshua Baker-LePain. The thread continues into April where Joshua put an SMC SMC8748L2 through a combined netpipe and netperf testing regime and found it held up quite well.

The very experienced Mark Hahn posted that he liked the SMC 8624T switches. He mentioned that the number 140 cluster in the Top500 used them (this was before the June 2004 Top500 list). Lars Henriksen posted that they used HP 2724 switches. He said that they were stable under heavy load and were about $2,000 for 24-ports. However, Joel Jaeggli pointed out the the HP switches can't do jumbo frames which could be an issue for improved performance for some applications. Gerry Creager posted that he just bought some Foundry EdgeIron 24-port switches for about $3,000 a switch. He also mentioned a few personal opinions and facts: he doesn't like 3Com switches for clusters; HP has been using Foundry components (he couldn't confirm that); and he would stay away from Asante. Trent Piepho asked what people thought of the Dell unmanaged 2624, 24-port GigE switch that was selling for around $330. There was no response.

The final post of this thread is from the very knowledgeable Bill Broadley, who posted a link to a tool he's written to test the performance of switches. It uses MPI_Send to send various size packets between sets of nodes (something like NetPIPE) through the switch. The tool can measure some latency and bandwidth estimates through the switch and also watch for network saturation. [Note: Also check out Microway's MPI Link-Checker (tm)]

In a similar thread, on May 21, 2004, Konstantin Kudin asked what people recommended for good Gigabit switches as he had heard stories of inexpensive switches that choked under heavy load.

The first response was from William Harman who recommended two 1U HP Procurve switches, the Procurve 2824 (list price of $2,499 for 24 GigE ports), and the Procurve 2848 (list price of $4,899 for 48 ports). The 2824 has a backplane bandwidth of 48 Gbps (Gigabits/second) and the 2848 has a backplane bandwidth of 96 Gbps. Michael Hanulec recommended the Nortel Baystack 5510 series of switches. Michael said it had a backplane bandwidth of 160 Gbps in a single chassis, and 1280 Gbps when eight switches are stacked together. Mark Hahn also posted that he thought the cheaper/smaller switches from smaller, less well known companies are not necessarily worse because they probably use the same chips as the larger manufacturers.

{mosgoogle right}

Sidebar One: Links Mentioned in Column

kill-a-watt

Bonnie++

pthread.c

nrelay.c

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He can found hanging around the Monkey Tree at ClusterMonkey.net (don't stick your arms through the bars though).