The Beowulf mailing list provides detailed discussions about issues concerning Linux HPC clusters. In this article I review some postings to the Beowulf list on the new (at the time) Orion desktop cluster which brings up some observations about low-power processors in clusters, and what Linux distribution people recommend for starting out with clusters.
On August 30, 2004 the ever-present Eugen Leitl forwarded the Orion Multisystems announcement to the beowulf mailing list. If you recall Orion is (was) producing a desktop cluster and a desk-side cluster. Russell Nordquist was the first to respond with two main observations. The first one was that the top brass at Orion are from Transmeta, and (2) the cluster wasn't a shared memory setup, which he thought would increase latency. Russell also asked about the Efficeon CPU, particularly compared to the Opteron. Russell said a 4-way Opteron box was about the same price as the 12 CPU Orion box was curious about the performance comparison.
In response to Russell's query, Jim Lux posted that he thought Orion's market (offices) might have power consumption and noise issues with a quad Opteron and wondered what the numbers were. Jim then postulated that if a 4-way Opteron were equal to 12 Efficeons, then the 96 CPU Orion box would like 32 Opterons. Then Jim went on to say that if the Opterons were 100W each, then 32 Opterons was 3200W, which was much larger than the 1500W-1800W range claimed by that Orion.
Russell responded that he found some data for a 4-way Opteron at 1.8 GHz. With some magic to scale the numbers to 2.4 GHz he thought the linpack performance for a 4-way Opteron would be similar to the 12 CPU Orion desktop box. So Russell thought that Jim's comments were well taken and that the Orion box wins on heat and noise.
Mark Hahn posted that he thought the performance of the Efficeons relative to the Opterons was about right. He also mentioned that for typical HPC clusters, memory capacity and bandwidth are low. So he thought that the Orion box might be good for cache-friendly things like sequence oriented bioinformatics codes or Monte-Carlo stuff that uses small models. He went on to say that he thought the main appeal of the Orion machines was the tidiness/integration/support. He also went on to say that for comparison you could put 18 Apple Xserves that would deliver about the same GFLOPS, but dissipate 2-3 times as much power, and take up about twice the space. But he thought that "chicks" would dig the stack of Xserves more (I didn't know chickens were into clustering. Hmmm... I need to rewatch "Chicken Little".)
Glen Gardner posted that he has been touting the virtues of low power clusters for a while (along with many other people). He found them to be very effective and they were the only way to get his 14-node cluster in his apartment. They cost him about $20 a month in power/cooling and are on 24/7 and in use much of the time. Glen thought that the ability to have a good performing cluster in your cubicle (or apartment), which has low power requirements and low noise, was a very attractive one. He also thought the price/performance for the desktop unit was very good.
Mark Hahn responded back that the Orion would be good at certain tasks but not good at more traditional HPC applications. Mark also asked some questions about the necessity of having a cluster on your desk. He mentioned that he could do many or most thing on his cluster remotely.
In response, Robert Brown said that he loved his home cluster and felt it server many useful purposes. He said he did lots of challenging things on his home cluster and even did production work on it.
Michael Will also mentioned that AMD has two low power Opteron versions. They have an HE version that is specified at 55W, but it was twice the cost of standard Opterons. There is also an EE version that is specified at 30W (Note: these are for the Socket 940 CPUs. The new Socket F CPUs may fall into other power ranges). Michael also asked what is the range of per year cost for a 1U dual Opteron for air conditioning and power consumption.
A fairly long thesis about power consumption and cooling was provided by Robert Brown. As with virtually all of Robert's postings, except the ones disparaging Fortran, they are well worth reading. Robert did a very good initial estimate of about $200 a year to power and cool a dual Opteron machine ($1/Watt/Year). Robert went on to estimate when a more energy efficient but slower CPU would be worth the monetary savings in energy (power and cooling).
Jim Lux had a companion posting to Robert's, answering Michael's question. Jim had a very good point that computing costs based purely on energy savings is not correct. It is a nonlinear function. For example he said he could add a cluster with 1500W to his office dissipation and would only cost about $400/year. However, this would mean an additional air conditioning unit would have to be installed that would cost as much as the computer. Jim made a very good point, "I doubt that anyone can cost justify using more lower powered processors (i.e. fewer watts/GFLOPS) on a purely dollars/completed machine operation basis, except in some fairly unusual environments (i.e. a spacecraft, where every watt/joule is precious and expensive). The real value (to my mind) is making a cluster a minimal-hassle item, the same way a desktop PC is perceived today."
Jim went on to explain in more detail what he means by making a cluster an appliance rather than something that is created when one needs it.
At this point the discussion moved into one of COTS (Commercial Off The Shelf) versus a more proprietary cluster. The discussion was interesting and broke down into a discussion of what COTS really means. Overall this thread was very interesting because Orion is (was) trying something new and exciting and it appears that in general the cluster community appreciates that.
Discussing Linux distributions for clusters is always a fun time. There are plenty of personal opinions but there good information always seems to surface. On Sept. 8, 2004, Jeff Dragovich said that he had a small 10-node cluster for running a finite element program using MPI and wondered what flavor of Linux would be the best to use.
Tim Mattox was the first to reply and recommended a Linux distribution that is supported by the cluster management tool you select. Tim recommended using Warewulf, which only requires an rpm based distribution. He also mentioned that he preferred the new cAos distribution. He also uses cAos-1 and has found it to be very stable and very easy to install and maintain. He also said that he is anxiously awaiting cAos-2 (note: cAos-3 is under development and a "beta" is available).
Robert Brown (rgb) posted some good points. He started by saying that whatever distribution Jeff chooses, he should be become adapt at using it. Rgb went on to say that he had some trouble with FC1, but FC2 was working fine. He made a quick summary of other distributions including Debian. He then put on his "rant" cap and talked about his problem with major cluster distributions. In his opinion, the best way to install a cluster is from a repository via PXE or something like kickstart where the only thing that is different between a head node and a compute node are the packages chosen and some post-install scripts.
There were some further discussions about using Debian as a cluster distribution. Not to belittle Debian, but indications are that it hasn't been used in many clusters up to this point. However, that could change given the wonderful plethora of desktop Debian distributions available.
Erwan from Mandrakesoft posted to correct some comments from rgb. He went on to discuss how CLIC, Mandrakesoft's GPL-ed cluster distribution could help those new to clusters. From Erwan's description it sounds like a very good cluster distribution.
|Sidebar One: Links Mentioned in Column|
Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He can found hanging around the Monkey Tree at ClusterMonkey.net (don't stick your arms through the bars though).