[Beowulf] Hypothetical Situation
afant at geekmail.cc
Thu Jan 22 11:29:30 EST 2004
Good to see someone from Rice on here. I spent a couple years at Rice
in the mid-ninetys, myself in Biochemistry and Cell Biology.
To be clear, are you wanting to use a "real" beowulf with single system
image and unified process space, or are you more interested in a compute
farm, where each system runs it's own copy of the OS? If it's the former,
I can't be much help.
If it is the latter, is there a reason why user-mode linux is not an option
for the researchers? Are they part of the CITI crew and doing hardcore OS
work, or are they computational science types who have software that is
just very anal about kernel, etc. versioning? It would probably be far
easier to keep a stable minimal OS on each node and mount loopback
filesystems that the users keep their custom environments in. Alternately,
if you have to go the bare-metal route, systemimager might be an option,
if the delay in time to reimage the system after job submission is not
deemed excessive; in any case you may need to examine a fast dedicated
network for reimaging to keep the time cost low.
Normally, I am a big fan of LSF, but from the sounds of the demands, you
would probably do better to work with SGE (Grid Engine), since the
situation you describe would require a couple of reboots if real installs
are needed. I don't think any available batch system can handle reboots as
part of a prelude/prologue script cleanly, but with SGE, you at least have
a chance to hack the code to try and fix it.
Have you given any thought to which distro you intend to use? I will admit
to being a Gentoo fanatic, but it sounds as if you might want to look into
it. Gentoo is a source-based Linux distribution that allows the
installation to be fairly tightly tailored to the application at hand. If
the users are demanding their preferred kernels, I wouldn't be surprised if
they started wanting their own tool-chains and application versions in the
near future. Gentoo would allow them to build exactly what they want in a
chrooted directory that could then either be used as the loopback
filesystem for uml, or as the golden client to upload to systemimager. You
can find out more from www.gentoo.org or check on irc.freenode.net in
This sounds like an interesting problem. I'd be glad to answer more
questions or serve as a sounding board if you want. Is Cathy F. still
working on networking stuff in Mudd? If she is, and you see her, please
give her my regards.
Hope this Helps,
--On Thursday, January 22, 2004 09:29:09 -0600 "Brent M. Clements"
<bclem at rice.edu> wrote:
> We have a request by some of our research scientists to form a beowulf
> cluster that provides the following abilities:
> 1. It must be part of a shared computing facility controlled by a batch
> queuing system.
> 2. A normal user must be able to compile a customized kernel, then submit
> a job with a variable pointing to that kernel. The batch queuing system
> must then load that kernel onto the allocated nodes and reboot the nodes
> 2a. If the node doesn't come back up after rebooting, the job must be
> canceled and the node rebuilt automatically with a stable kernel/image.
> 3. When the job is finished, the node must be rebuilt automatically using
> a stable kernel/image.
> The admin's here have come up with a name for this: "user
> provisioned os imaging". Our gut feeling is that this can be done but it
> will be a highly customized system. What is everyone's thoughts,
> experiences, etc etc concerning doing something like this?
> Brent Clements
> Linux Technology Specialist
> Information Technology
> Rice University
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf