Hits: 6983
Cluster mechanicals and cluster distros - Me likes!

The Beowulf mailing list provides detailed discussions about issues concerning Linux HPC clusters. In this article I review some postings to the Beowulf list on using single power supplies for multiple nodes, a discussion about cooling and general machine room topics, and about cluster distribution concepts.

One Power Supply For Multiple Nodes

More and more people are thinking about custom cases and custom mountings for their clusters. On June 28, 2004, Frank Joerdens posted to the Beowulf mailing list asking if anyone sold power supplies in the kilowatt range so he could attach several nodes to a single power supply. He also asked if anyone had experience with them, what the price was, and if they were any good.

{mosgoogle right}

List contributor Alvin Oga posted that a 700W to 1000W power supply could probably provide power for ten or so mini-itx systems. He also thought that you could save a few dollars by using a single power supply for multiple nodes, but it may or may not be cheaper to get a small power supply and case for each system.

Joel Jaeggli also posted that he thought a single power supply for multiple nodes would results in very large conductors. He recommended going with telecommunication dc power supplies (Note: Rackable Systems is already doing this in production racks). According to Joel you could then use a dc-dc power supply for the nodes that is very efficient and very compact. Joel also thought that 1 kW would be enough for 4-6 dual Opteron nodes. Alvin Oga responded that telecommunication power supplies are much more expensive that typical power supplies (5x-10x in Alvin's opinion). He also pointed out that if you lose a power supply, you would lose the compute capability of the nodes attached to the power supply.

Frank Joerdens then responded that he thought large conductors might not be such a big deal because you could be creative and use aluminum tubes that double as part of the structure. He also thought that such large power supplies might become expensive.

Then Dr. Power himself (Jim Lux) posted to this thread. Jim said that with modern PWM power supply design, the maximum efficiency is largely independent of the power output (same power consumption for one large power supply or a bunch of small ones). However, Jim pointed out that efficiency is not a big driver in typical PC power supplies. Jim also provided some general comments. He said that a single large power supply will have fewer components than multiple power supplies, so the probability of failure is lower. However, if you do lose one, then the impact is larger. Jim also pointed out that running large (2-3 meter) lengths of cable to connect to motherboards would also introduce problems because of the change in resistance due to the length. He also mentioned that if you want to remove the nodes you will have to think about connectors and/or service loops in the cables.

Finally, Frank Joedens, the originator of the thread posted that he agreed with Jim's comments and then mentioned that veering away from COTS (Commercial Off The Shelf) doesn't really buy you anything.

This discussion was very interesting because it shows that people are "thinking outside of the box" to further improve clusters and that informed opinions are one of the trademarks of the Beowulf list.

Cooling Units? Raised Floor?

As you can tell from many of the postings to the Beowulf list in the last year, power consumption, power usage, and machine room design are becoming increasingly important issues. Brian Dobbins posted to the list that he has recently put together a machine room design but wanted to get opinions on cooling design and/or layout in general. He had some specific questions about his design and cited ClusterMonkey's own Robert Brown for his Linux Magazine article on machine room design (Also see Getting Serious: Cluster Infrastructure.

Jim Lux was the first one to respond to Brian's post. Jim thought the amount of cooling Brian proposed (4.7 tons) was fairly small (household AC units are 3-5 tons). Jim also thought that having a raised floor was not such a big issue if you only have one row of racks as Brain does. In addition Jim thought that if you have rows and rows of racks, then a raised floor might be a good idea. Finally, Jim suggested that they partition the systems across various UPS units so that they don't all go down together. Jim also suggested a coat rack for jackets and a temperature/humidity recorder.

John Hearn responded that he thought Jim's suggestions were good ones. John went on to mention that he saw a machine room in England that he liked. They used power connectors that came down from the ceiling. Robert Brown replied that they too had power connectors coming from the ceiling.

In a separate post, Robert added some things to the list of needed items in a server room. The first item he would add to the list is a workbench, with various tools and good lighting. He would also add a small KVM, a flat panel keyboard/video, and other various little bits and bobs for doing your wizardry. Robert also recommended a nice swivel chair for working at the workbench as well as headphones to cover up the machine room noise and for listening to your music collection while you work. He also suggested that, in general, you want to try to engineer the room for growth now before the renovation work begins. He also suggested getting some additional HVAC - perhaps 10 tons - to adapt for future growth without having to "re-renovate" the room.

John Hearns jumped in to add that a nice 19 inch rack mounted fridge would be good as well as the 4U wine rack and provided links to both items. RGB then responded to John's posting with some sage advice about drinking and computing. However, one should always be open to new ideas and possibilities.

Robert then posted some details about the workbenches they use. He said that they use a leftover wooden workbench from a Physics lab (wood being the key word). He then detailed much of what they use when they diagnose/repair/build nodes. Finally, he had some advice about whether to choose a support option with the various vendors or to support the systems yourself.

Jakob Oestergaard had some very good advice for an addition to the perfect machine room. Jakob though a good first aid kit would be a very worthwhile addition. He also echoed other recommendations for several flashlights. Chris Samuel also mentioned that you should have spare batteries for the flashlights as well.

The continuing discussion about the physical aspects of beowulfery is showing that people are seriously considering how to properly design their cluster environment.

Local Disk or NFS Root?

One of the topics discussed on the Beowulf mailing list is how people construct their clusters from an operating system perspective. There are many ways you can "construct" your cluster. On July 14, 2004, Brent Clements asked whether people preferred to install the operating system (OS) on each node or to use nfsroot for the compute nodes. Brent said that in his experiments, using nfsroot was a heck of a lot easier that maintaining a systemimager configuration (they used systemimager for installing Linux on the nodes).

Tim Mattox replied that Brent had missed a third option - using a RAM disk as the rootfs. Tim said that for years he had had done both nfsroot and disk-full (meaning each node had a disk with the OS installed on the disk) clusters. He said that the RAM disk approach is head and shoulders above the others. He recommended examining Warewulf which he through to be a very good cluster distribution. In fact he said that he liked Warewulf so much, he became one of the developers. Tim also went on to mention that a drawback to the nfsroot approach was that if the NFS server was rebooted or down for any length of time, the compute nodes tend to fail.

{mosgoogle right}

Mark Hahn said that he thought the nfsroot approach created quite a bit of traffic, but that that the nfsroot approach was incredibly convenient. Mark said that he has built a couple of clusters using nfsroot (around 100 dual nodes) and that there has not been any significant problems with the NFS server. He likes the nfsroot approach so much that he said that if there were problems he would split the NFS traffic across two file servers rather than abandon nfsroot.

Tony Travis posted that he has a 32-node AMD Athlon cluster running ClusterNFS using an openMosix kernel. He exported the root partition as read-only and the compute nodes have symlinks for volatile files. The compute nodes have a disk in them for use as temporary space and swap.

Sean Dilda posted that he prefers the local disk approach. He felt that using local disks made the cluster much more scalable. However, he did say that maintaining disk images was a pain. He uses a kickstart configuration rather than an image to make his life a bit easier.

Kimmo Kallio posted that he uses a solution where the nodes boot over the network and create a ramdisk to get things going. The next step in the booting process checks/creates the partitions and file systems and then copies the root fs to the local drive, does a pivot_root and abandons the ramdisk. The next step is to copy over everything else in .tar.gz files and untar them. There are some other steps that he takes to build the nodes. He thought that this approach has the performance and server independence benefits of local disk but has the low maintenance approach of network booting.

Topics such as this one are always very interesting because they serve to develop "best practice" information for people considering clusters.

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He can found hanging around the Monkey Tree at (don't stick your arms through the bars though).