[Beowulf] Hypothetical Situation
Brent M. Clements
bclem at rice.edu
Thu Jan 22 18:33:12 EST 2004
We have actually come up with an idea called a "kernel preload" system
based on ideas from alot of places.
Basically what we have is as follows:
The system admins provide the source code to a bunch of "stock" kernels
known to work with the hardware we are using in the cluster.
The user takes a copy of this source code and makes his/her changes.
We have a script that's called skj(submit kernel job). This is a perl
script that does the following
1. Makes a reservation within job scheduling system for 1 hour, reserving
what ever number of nodes the user requests.
2. Creates a patch from their modified source code and then builds the
modified kernel. Please note that the script will do tests to make sure
that the patch file does not remove any needed hardware drivers to ensure
that the machine will actually boot.
4. The script makes it a tftp'd kernel and modifies the dhcp file to
reflect the kernel changes for the nodes reserved.
5. The machines are power cycled/rebooted
6. The script waits up to 15 minutes for the machines that have the new
kernel installed to come back up. If after 15 minutes there are no
responses from any of the machines, the script reblasts the system with
the stock kernel and outputs an error message. We will actually speed up
the boot process by installing linuxbios, so our waittime will be like 5
7. But If everything goes well, the script modifies the reservation and
creates a pbs job script, or gives them an interactive session to the
Voila, we have solved our issue. This could also work if we wanted to
actually reblast an entirely new image onto the allocated nodes rather
than just the kernel. Remember our requirements/issue was that we needed
this to work easily with our job schedulars(pbs/maui).
I am going to be posting a document outlining the requirements,and the
solution next week if anyone is interested.
Thanks to everyone who helped out!
Linux Technology Specialist
On Thu, 22 Jan 2004, Mark Hahn wrote:
> > You'll need to spend a few work weeks/months building some nice quasi
> > automated scripts and documentation to help the folks build their kernels
> > and put them into an appropriate form.
> I can't imagine why this would take even a day of work,
> assuming you use plain old PXE to boot the nodes, already
> have a table of MAC addrs, and are comfortable with simple
> suid script programming:
> pushkernel(machinename, kernelfile):
> $mac = grep $machinename mac-table-file
> ln -s $kernelfile /var/tftp/pxelinux.cfg/$mac
> powercontrol -r $machinename
> sleep 180
> if (cant ping $machinename)
> ln -s fallbackimage /var/tftp/pxelinux.cfg/$mac
> (powercontrol is a simple perl script I use to talk to my APC power thingies.)
> hooking this into a batch system shouldn't be hard.
> physically installing the kernel on a machine and booting it locally
> seems pointless. "lilo -R" is great for trying out new kernels, though.
> I understand someone has added this feature to grub, but haven't seen it myself.
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf