The Beowulf mailing list provides detailed discussions about issues concerning Linux HPC clusters. In this article I review some postings to the Beowulf list on KVM's (keyboard/video/mouse) to the compute nodes as well as a Fedora cluster project.
KVM to a Compute Node?
One of the questions or conundrums for clusters has been the most efficient or cost-effective way to get a keyboard/video/mouse to a node that you are interested in seeing. On May 28, 2004, Suvendra Nath Dutta asked what kind of solution people use to access individual nodes rather than KVM systems. Guy Coates was the first to respond with suggesting that you use good old fashioned serial consoles that came over with the Pilgrims (well, maybe not that old). He mentioned that newer servers give you BIOS access over serial as well. He went on to mention that the 2.6 kernel series has a netconsole feature that puts out a console over UDP but did point out that you can't get BIOS access or early boot access with netconsole.
Robert Brown joined in to say that he really dislikes serial consoles and didn't like KVM's any better. He suggested three things: (1) get nodes that can boot without a keyboard/monitor attached; (2) get a PXE NIC for the nodes and PXE boot; and (3) invest in a "cluster crash cart." A "cluster crash cart" is simple a rolling cart with a monitor (Bob recommends a flat monitor), keyboard and mouse, and a small UPS (A UPS is suggested just in case you need power for your monitor). Robert also suggested that you might want a 4-port KVM. He estimated that a complete crash cart was about $400-$500. In a later post Robert (also referred to as rgb) thought that in general serial consoles were not worth the trouble since they are rarely used although he did mention that a serial terminal server might be worth it for people who need to telnet/ssh into a port and get to a real console.
Of course, Robert's comments started some debate. Guy Coates jumped in that he thought terminal consoles with worth the extra cost for the occasional use because he uses the time he saves running to the machine room to help users. Robert responded in a fairly lengthy email that he agreed and disagreed with Guy. He thought that if you need remote management then serial terminal servers can be a cost effective solution. However, Robert went on to say that he thought there were better approaches. Robert's well though out argument was that remote terminal access is mostly used for "bouncing" (i.e. rebooting) nodes that are locked up. He thought that a PXE/WOL (WOL = Wake On LAN) system might be better because you can force the node to reboot and you can control the "personality" that gets sent to the node by choosing the appropriate image. For example, you could send a diskless/repair boot image that would allow the node to come up and then you could plug into it for repair or even work on it over the network. Robert went to point out that the one remaining obstacle in using PC hardware today is accessing the BIOS over the network.
Michael Will from Penguin Computing posted that they put a small serial remote control card into their boxes that can simulate the pressing of the power and rest buttons. This feature allows you to reset even a hard stuck box. He also said that they use Cyclades terminal servers in their clusters for console access.
Bob was impressed with the serial remote power solution but pointed out an important trade-off that many people miss. For the price of a terminal console setup including the cables, etc., you can buy extra nodes and then just walk to the server room to reset a stuck box. In effect, these extra box(es) buy you a "cushion" of how many nodes have to be up at any one time.
Jerker Nyberg posted that a good solution is to just get a motherboard with IPMI (Intelligent Platform Management Interface). He gave some simple steps on how to get IPMI configured on a system that will allow you to have console over the network. As Jerker points out, this comes without the need for extra serial cables. However, he did mention that there were some problems with it including using the enter key when accessing the BIOS.
A couple of open-source IPMI implementations were mentioned ( FreeIPMI and OpenIPMI). However, it was mentioned via a link in a post from Cedric Lambert that Serial Over LAN (SOL) solutions were non-standard in anything before IPMI-2.0. So, until the IPMI implementations by the various motherboard manufacturers reach version 2.0, the old SOL solutions, which are proprietary and non-standard, will probably not be supported in the open source projects mentioned above. [One could say an open support for SOL is SOL until version 2 - Ed.]
There were several posts later that mentioned other alternatives. One is to use serial over USB with some software glue (inexpensive concept from Julien Leduc). Mike Davis said that the combination of a terminal server and remote power using something like APC PDU's is a powerful and easy concept. Finally Angel Rivera posted to say that while terminal servers and their ilk were very nice he still preferred the old crash cart method.
This discussion was about 2 years ago and it looks like a number of people preferred the "crash cart" approach. In my experience if the machine is close enough and nodes don't go down too often then this approach makes a lot of sense. However if the nodes are fairly far away or if nodes are going down fairly regularly I would invest in some kind of terminal server or IPMI implementation (however, if the nodes are going down fairly regularly then I would assume there is a serious problem and I would likely be in the machine room trying to debug the problem or yelling at the vendor :) ).
I used to think this question was only for clusters of a certain size (couple of hundred nodes) and for large clusters (over 500 nodes), then you would obviously want a way to access the console of each node and to perhaps update the BIOS. However, I'm beginning to think that even for large clusters this question is still somewhat open. There are many reasons for this and it would require a whole other article or column to discuss it.
Fedora Cluster Project?
On June 9, 2004, Mitchell Skinner asked about starting a Fedora Cluster Project. He thought that this project would have some appeal relative to other cluster oriented distributions including a more up to date distribution and taking advantage of Fedora testing and engineering. He was thinking about organizing such an effort along the lines of OSCAR, but not quite the same. He did mention one downside -- Fedora changes very quickly and may not be as stable as other distributions.
The ever present Robert Brown was one of the first to post and agree with many of Mitch's comments. However, Robert didn't feel that Fedora was always going to be an "unstable" distribution. He thought it was going through growing pains as many other distributions have done. In fact Robert thought that Mitch's idea was "peachy." Robert went on to say that he was looking for a cluster "layer" that could be applied on top of Fedora. He was hoping it would be something of an add-on rather than part of a specific distribution. In fact, he thought it would be nice to work with any rpm based distribution.
There was a subsequent discussion about the relative merits of a distribution that is supported by a large community, such as Fedora, even though it's not necessarily focused on clusters, and a distribution that is still a general distribution with more of a cluster focus, such as cAos. One of the most important points that came out of the discussion as posted by Joe Landman was that ISV (Independent Software Vendors) focus on only a small number of distributions to keep their costs down. However, a number of clusters depend upon these commercial applications so they are limited in what Linux distribution they can choose.
There was a related thread that started on June 16, 2004 from a post by John Hearns who asked about SLC3 (Scientific Linux - CERN) forming the basis of a good cluster distribution (Note: at the end of the columns is a link to SLC4 which is a newer version of SLC). SLC3 is based on RHEL 3 with the required trademarks and copyrights removed. It is a collaboration between CERN and Fermilab (Fermilab's version is SLF). One of the developers of Scientific Linux, Connie Sieh from Fermilab, then posted and explained the project giving some very good links and also mentioning the SL mailing lists.
Eray Ozkural then posted and asked a related question - what Beowulf software makes it easiest to install a node from scratch (in essence the easiest way to build a cluster)? He mentioned that he was using Debian and FAI (Fully Automatic Installation) and was disappointed at the number of bugs.
John Hearns commented that he thought whatever did PXE booting was the best. Tim Mattox also posted to say that he thought Warewulf is a very good solution because Warewulf can netboot (PXE or Etherboot) using an image supplied by the master node. This image is created from the master node so it is essentially distribution agnostic (Warewulf required an "rpm based" distribution at the time but now supports Debian distributions). Mark Hahn added that the best Beowulf package was one that didn't install anything on the nodes. Mark said that booting over the net (dhcp/tftp/pxe) was very reliable. He went on to say that he didn't even bother with initrd's. He just shipped a monolithic kernel to the compute nodes and mounted root over NFS. Rafael Leiva recommended Quattor, the automatic installation and configuration tool used at CERN.
Daniel Ridge had a very interesting post with an important point. He said; "... cluster people fixate readily on concepts like 'install' and 'manage' and easily lose sight of the people who actually create new and interesting applications - not management metaware for frobbing the machine itself - but actual programs that accomplish some useful purpose." This point is especially important because ultimately clusters should be doing real work.
These kinds of discussions are a good indication that there is no best way to deploy a cluster software stack. Running commercial software on "open distributions" complicates things but there are ISV's who are becoming much smarter and support glibc/kernel combinations rather than just limiting themselves to a few commercial distributions. (e.g. Pathscale). These threads also point out the importance of keeping an eye on the applications when considering a cluster design.
|Sidebar One: Links Mentioned in Column|
This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.
Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He can found hanging around the Monkey Tree at ClusterMonkey.net (don't stick your arms through the bars though).