[Beowulf] Which distro for the cluster?
gdjacobs at gmail.com
Fri Dec 29 05:54:30 EST 2006
I don't actually take any advocacy position on choice of distro. RH,
Debian, BSD, I don't care. Any contrary statements are made strictly in
the interest of the truth.
Robert G. Brown wrote:
> On Fri, 29 Dec 2006, Andrew M.A. Cater wrote:
> Also, how large are those speed advantages? How many of them cannot
> already be obtained by simply using a good commercial compiler and
> spending some time tuning the application? Very few tools (ATLAS
> being a good example) really tune per microarchitecture. The process
> is not linear, and it is not easy. Even ATLAS tunes "automatically"
> more from a multidimensional gradient search based on certain
> assumptions -- I don't think it would be easy to prove that the
> optimum it reaches is a global optimum.
It most definitely isn't. Goto trounces it easily. ATLAS is the first
stab at an optimized BLAS library before the hand coders go to work.
> No, not joking at all. FC is perfectly fine for a cluster,
> especially one built with very new hardware (hardware likely to need
> a very recent kernel and libraries to work at all) and actually
> upgrades-by-one tend to work quite well at this point for systems
> that haven't been overgooped with user-level crack or homemade stuff
> overlaid outside of the RPM/repo/yum ritual.
> Remember, a cluster node is likely to have a really, really boring
> and very short package list. We're not talking about major overhauls
> in X or gnome or the almost five thousand packages in extras having
> much impact -- it is more a matter of the kernel and basic libraries,
> PVM and/or MPI and/or a few user's choice packages, maybe some
> specialty libraries. I'm guessing four or five very basic package
> groups and a dozen individual packages and whatever dependencies they
> pull in. Or less. The good thing about FC >>is<< the relatively
> rapid renewal of at least some of the libraries -- one could die of
> old age waiting for the latest version of the GSL, for example, to
> get into RHEL/Centos. So one possible strategy is to develop a very
> conservative cluster image and upgrade every other FC release, which
> is pretty much what Duke does with FC anyway.
I'd rather have volatile user-level libraries and stable system level
software than vice versa. Centos users need to be introduced to the
lovely concept of backporting.
> Also, plenty of folks on this list have done just fine running
> "frozen" linux distros "as is" for years on cluster nodes. If they
> aren't broke, and live behind a firewall so security fixes aren't
> terribly important, why fix them? I've got a server upstairs (at
> home) that is still running <blush> RH 9. I keep meaning to upgrade
> it, but I never have time to set up and safely solve the
> bootstrapping problem involved, and it works fine (well inside a
> firewall and physically secure).
Call me paranoid, but I don't like the idea of a Cadbury Cream Egg
security model (hard outer shell, soft gooey center). I won't say more,
'cuz I feel like I've had this discussion before.
Upgrade it, man. Once, when I was bored, I installed apt-rpm on a RH8
machine to see what dist-upgrade looked like in the land of the Red Hat.
Interesting experience, and it worked just fine.
> Similarly, I had nodes at Duke that ran RH 7.3 for something like
> four years, until they were finally reinstalled with FC 2 or
> thereabouts. Why not? 7.3 was stable and just plain "worked" on at
> least these nodes; the nodes ran just fine without crashing and
> supported near-continuous computation for that entire time. So one
> could also easily use FC-whatever by developing and fine tuning a
> reasonably bulletproof cluster node configuration for YOUR hardware
> within its supported year+, then just freeze it. Or freeze it until
> there is a strong REASON to upgrade it -- a miraculously improved
> libc, a new GSL that has routines and bugfixes you really need,
> superyum, bproc as a standard option, cernlib in extras (the latter a
> really good reason for at least SOME people to upgrade to FC6:-).
Or use a distro that backports security fixes into affected packages
while maintaining ABI and API stability. Gives you a frozen target for
your users and more peace of mind.
> Honestly, with a kickstart-based cluster, reinstalling a thousand
> nodes is a matter of preparing the (new) repo -- usually by rsync'ing
> one of the toplevel mirrors -- and debugging the old install on a
> single node until satisfied. One then has a choice between a yum
> upgrade or (I'd recommend instead) yum-distributing an "upgrade"
> package that sets up e.g. grub to do a new, clean, kickstart
> reinstall, and then triggers it. You could package the whole thing
> to go off automagically overnight and not even be present -- the next
> day you come in, your nodes are all upgraded.
Isn't automatic package management great. Like crack on gasoline.
> I used to include a "node install" in my standard dog and pony show
> for people come to visit our cluster -- I'd walk up to an idle node,
> reboot it into the PXE kickstart image, and talk about the fact that
> I was reinstalling it. We had a fast enough network and tight enough
> node image that usually the reinstall would finish about the same
> time that my spiel was finished. It was then immediately available
> for more work. Upgrades are just that easy. That's scalability.
> Warewulf makes it even easier -- build your new image, change a
> single pointer on the master/server, reboot the cluster.
> I wouldn't advise either running upgrades or freezes of FC for all
> cluster environments, but they certainly are reasonable alternatives
> for at least some. FC is far from laughable as a cluster distro.
What I'd like to see is an interested party which would implement a
good, long term security management program for FC(2n+b) releases. RH
obviously won't do this.
> Yeah, I dunno about SuSE. I tend to include it in any list because
> it is a serious player and (as has been pointed out already in this
> thread e.g. deleted below) only the serious players tend to attract
> commercial/supported software companies. Still, as long as it and RH
> maintain ridiculously high prices (IMHO) for non-commercial
> environments I have a hard time pushing either one native anywhere
> but in a corporate environment or a non-commercial environment where
> their line of support or a piece of software that "only" runs on e.g.
> RHEL or SuSE is a critical issue. Banks need super conservatism and
> can afford to pay for it. Cluster nodes can afford to be agile and
> change, or not, as required by their function and environment, and
> cluster builders in academe tend to be poor and highly cost senstive.
> Most of them don't need to pay for either one.
> Not to argue, but Scientific Linux is (like Centos) recompiled RHEL
> and also has a large set of these tools including some
> physics/astronomy related tools that were, at least, hard to find
> other places. However, FC 6 is pretty insane. There are something
> like 6500 packages total in the repo list I have selected in yumex on
> my FC 6 laptop (FC itself, livna, extras, some Duke stuff, no
> freshrpms. This number seems to have increased by around 500 in the
> last four weeks IIRC -- I'm guessing people keep adding stuff to
> extras and maybe livna. At this point FC 6 has e.g. cernlib,
> ganglia, and much more -- I'm guessing that anything that is in SL is
> now in FC 6 extras as SL is too slow/conservative for a lot of
> people (as is the RHEL/Centos that is its base).
Do _not_ start a contest like this with the Debian people. You _will_ lose.
> Debian may well have more stuff, or better stuff for doing numerical
> work -- I personally haven't done a detailed package-by-package
> comparison and don't know. I do know that only a tiny fraction of
> all of the packages available in either one are likely to be relevant
> to most cluster builders, and that it is VERY likely that anything
> that is missing from either one can easily be packaged and added to
> your "local" repo with far less work than what is involved in
> learning a "new" distro if you're already used to one.
Agreed, and security is not as much of a concern with such user-level
programs, so these packages don't necessarily have to follow any
security patching regime.
> The bottom line is that I think that most people will find it easiest
> to install the linux distro they are most used to and will find that
> nearly any of them are adequate to the task, EXCEPT (as noted)
> non-packaged or poorly packaged distros -- gentoo and slackware e.g.
> Scaling is everything. Scripted installs (ideally FAST scripted
> installs) and fully automated maintenance from a common and
> user-modifiable repo base are a necessity. There is no question that
> Debian has this. There is also no question that most of the
> RPM-based distros have it as well, and at this point with yum they
> are pretty much AS easy to install and update and upgrade as Debian
> ever has been. So it ends up being a religious issue, not a
> substantive one, except where economics or task specific
> functionality kick in (which can necessitate a very specific distro
> choice even if it is quite expensive).
I haven't used a RH based machine which regularly synced against a
fast-moving package repository, so I can't really compare. :)
> Excellent advice. Warewulf in particular will help you learn some of
> the solutions that make a cluster scalable even if you opt for some
> other paradigm in the end.
> A "good" solution in all cases is one where you prototype with a
> server and ONE node initially, and can install the other six or seven
> by at most network booting them and going off to play with your wii
> and drink a beer for a while. Possibly a very short while. If, of
> course, you managed to nab a wii (we hypothesized that wii stands for
> "where is it?" and not "wireless interactive interface" while
> shopping before Christmas...;-). And like beer.
Prototyping is absolutely necessary for any large-scale roll out. Better
to learn how to do it right.
> Yeah, kickstart is lovely. It isn't quite perfect -- I personally
> wish it were a two-phase install, with a short "uninterruptible"
> installation of the basic package group and maybe X, followed by a
> yum-based overlay installation of everything else that is entirely
> interruptible and restartable. But then, I <sigh> install over DSL
> lines from home sometimes and get irritated if the install fails for
> any reason before finishing, which over a full day of installation
> isn't that unlikely...
> Otherwise, though, it is quite decent.
> Oooo, that sounds a lot like using yum to do a RPM-based install from
> a "naked" list of packages and PXE/diskless root. Something that
> I'd do if my life depended on it, for sure, but way short of what
> kickstart does and something likely to be a world of
> fix-me-up-after-the-fact pain. kickstart manages e.g. network
> configuration, firewall setup, language setup, time setup, KVM setup
> (or not), disk and raid setup (and properly layered mounting),
> grup/boot setup, root account setup, more. The actual installation of
> packages from a list is the easy part, at least at this point, given
> dpkg and/or yum.
I personally believe more configuration is done on Debian systems in
package configuration than in the installer as compared with RH, but I
do agree with you mainly. It's way short of what FAI, replicator, and
system imager do too.
> Yes, one can (re)invent many wheels to make all this happen --
> package up stuff, rsync stuff, use cfengine (in FC6 extras:-), write
> bash or python scripts. Sheer torture. Been there, done that, long
> ago and never again.
Hey, some people like this. Some people compete in Japanese game shows.
Geoffrey D. Jacobs
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf