The Beowulf mailing list provides detailed discussions about issues concerning Linux HPC clusters. In this article I review some postings to the Beowulf list on clusters of bare motherboards and choosing a high-speed interconnect.
Beowulf of Bare Motherboards?
The experimental, entrepreneurial spirit for clusters is alive and well! On Sept. 27, 2004, Andrew Pisorski asked about putting some bare motherboards on metal racks by directly attaching them to the racks in some fashion. Even though Andrew didn't mention it, the goals of projects such as these, are to reduce costs (no cases) and increase the density for little or no cost (blade servers from vendors are fairly pricey). Jack Wathey responded that in late 2003, he built a cluster with bare motherboards attached to metal shelves but the motherboards were mounted to aluminum sheets. He attached the motherboards using nylon standoffs. He also said that Ken Chase had done something similar by packing the motherboards close together with no shielding between them in a Plexiglas case. He said that Ken had some problems because the boards had some radio frequency interference between them and some would not boot. He as also worried about the risk of fire with so much wood and plastic near high power components.
Glenn Gardner responded that he had built a cluster in his apartment using mini-itx motherboards. He said that the project was fun and he learned some lessons. First, he thought that drilling all of those holes was a ton of work. Second, thermal issues are likely to be big and you will require some fans to provide adequate cooling. Third, he thought RFI (radio frequency interference) could be a big issue so you should prepare for it. Fourth, power distribution might be an issue (watch startup power requirements so you should stagger the nodes during boot up). Fifth, he thought mechanical integrity was important and that you should plan accordingly. He also pointed out that if you need to replace a motherboard you could end up dismantling the entire cluster!
Alvin Oga jumped in to post some comments on a mini-itx project he was working on. He mentioned that he's working on a 4U blade system where each blade has two mini-itx boards (10 CPUs per 4U).
Jim Lux then posted on this topic. He first commented on having to drill hundred of holes for mounting motherboards and recommended looking at contracting this out. He also said that one advantage of dense packing is that you can use a few large diameter fans that are very efficient at moving large amounts of air fairly quietly (efficiency of the fans goes up as the diameter increases). Jim also had some very good discussion about shielding and pointed out that good size case holes won't hurt the EMI performance and can allow a wireless signal to make it into the system.
Florent Calvayrac posted that a link to small Beowulf with which he was involved that consists of an 8 CPU Athlon system. The motherboards were mounted inside a sheet metal box. Florent stated that the design took about 20 hours and fabrication took about 1 week. A really interesting thing is that some graduate students did a thermal/cooling analysis of the system and predicted the temperatures to within 1 deg. C.
Andrew Pisorski responded that he appreciate everyone's comments and then asked about using one power supply for several motherboards. He also wondered what he could do to stagger the boot sequence for the motherboards on a single power supply to reduce the peak load. Jim Lux responded with some wonderful insight into the power usage at start up and pointed out that the biggest draw of power at startup is the hard drive spinning up. Since this is a big peak power issue, going with diskless nodes has a definite advantage.
Choosing a High-speed Interconnect
[Note: Since this discussion, we have posted our Cluster Interconnects: The Whole Shebang review. You may find this article helpful in addition to the comments below.]Everyone loves speed. We're all speed junkies at heart. On October 12, 2004, Chris Sideroff asked about selecting a high-speed cluster interconnect. The group he works with has a 30 node dual Opteron cluster with GigE (Gigabit Ethernet) and wanted to upgrade to something like Myrinet, or Quadrics, or Infiniband (IB). Later Chris mentioned that they were running Computational Fluid Dynamics (CFD) codes including Fluent.