[Beowulf] Hypothetical Situation
Robert G. Brown
rgb at phy.duke.edu
Thu Jan 22 13:00:11 EST 2004
On Thu, 22 Jan 2004, Brent M. Clements wrote:
> We have a request by some of our research scientists to form a beowulf
> cluster that provides the following abilities:
> 1. It must be part of a shared computing facility controlled by a batch
> queuing system.
> 2. A normal user must be able to compile a customized kernel, then submit
> a job with a variable pointing to that kernel. The batch queuing system
> must then load that kernel onto the allocated nodes and reboot the nodes
> 2a. If the node doesn't come back up after rebooting, the job must be
> canceled and the node rebuilt automatically with a stable kernel/image.
> 3. When the job is finished, the node must be rebuilt automatically using
> a stable kernel/image.
> The admin's here have come up with a name for this: "user
> provisioned os imaging". Our gut feeling is that this can be done but it
> will be a highly customized system. What is everyone's thoughts,
> experiences, etc etc concerning doing something like this?
See the "COD" (Cluster on Demand) project in the Duke CPS department
(Jeff Chase, primarily, along with his student Justin Moore). It is
PRECISELY what you are looking for. I mean precisely. I don't know
that current status of the project, but that's just what COD is designed
for. Justin Moore, BTW, is the ex-protege of this list's own Greg
Lindahl, and a very savvy computer guy.
>From its summary:
Users of a shared cluster should be free to select the software
environments that best support their needs. Cluster-on-Demand (COD) is
a system to enable rapid, automated, on-the-fly partitioning of a
physical cluster into multiple independent virtual clusters. A virtual
cluster (vcluster) is a group of machines (physical or virtual)
configured for a common purpose, with associated user accounts and
storage resources, a user-specified software environment, and a
private IP address block and DNS naming domain. COD vclusters are
dynamic; their node allotments may change according to demand or
(This includes shipping each node its own custom kernel and filesystem
image -- full setup and boot into cluster, followed by teardown and
restoration of the resource to the pool when done, "arbitrary" kernels,
operating systems, and runtime node images.)
I'm very peripherally connected to this project, as in we all talk about
it, but I haven't contributed any code, only (maybe) a very few ideas.
Justin is probably who you'd want to talk to:
He's on this list; maybe he'll speak up if he's around.
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf