Fwd: [SSI] RFC: Etherboot/PXE to simplify installation and management
raysonlogin at yahoo.com
Fri Nov 9 11:48:47 EST 2001
--- "Brian J. Watson" <Brian.J.Watson at compaq.com> wrote:
> In an SSI cluster, it should only be necessary to install software
> on a single node. Most other nodes can be thin clients, using
> Etherboot or PXE to load their kernel and ramdisk from the
> CLMS master. A potential CLMS master node needs to have its kernel
> and ramdisk stored locally on a SCSI or IDE disk, in case it's
> the first node booted in the cluster. Even a potential CLMS master,
> however, can initially get its kernel and ramdisk via Etherboot/PXE
> and install them onto its hard disk with minimal sysadmin
> Etherboot is an open-source software package for creating ROM images
> that allow a computer to boot off the network using DHCP or BOOTP.
> For those who cannot or will not flash their ROM with one of these
> images, Etherboot includes a special boot block for loading the image
> from a floppy or hard drive. Etherboot appears to support about
> a hundred different NIC models. Unfortunately, it only supports
> the x86 platform right now.
> For more information, visit the Etherboot website:
> PXE (Preboot Execution Environment) is an Intel specification for
> doing pretty much the same thing. An advantage is that PXE images
> come pre-loaded on certain NICs, but I suspect most PXE images are
> closed source.
> To read Intel's PXE spec:
> To support this new dependent node booting model, changes to initial
> node installation would include:
> - Making sure dhcpd and tftpd are installed as part of the base
> Linux distribution.
> - Installing mknbi (part of Etherboot) on the shared root for
> building a tagged image of the kernel and ramdisk.
> - Adding an /etc/ssitab file for specifying the MAC address,
> IP address, node number, and local boot flag for each node
> allowed to join the cluster. For each node with the local boot
> flag set, a device for the boot partition must also be specified.
> The local boot flag should only be set for potential CLMS master
> nodes on the x86 platform. For platforms not supported by
> Etherboot/PXE, such as Alpha, _all_ nodes should have the local
> boot flag set.
> - Eliminating /etc/cluster.conf, which is obsoleted by /etc/ssitab.
> - Installing a new mkdhcpd.ssi command that builds /etc/dhcpd.conf
> from the data in /etc/ssitab. To support non-SSI uses of DHCP,
> it copies anything it finds in /etc/dhcpd.proto before appending
> the generated lines.
> - Installing a new lilo.ssi command that does the following:
> * reads /etc/lilo.conf and /etc/ssitab, and uses onnode and
> to sync the default kernel and ramdisk out to all potential
> nodes that are up with the local boot flag set
> * runs mknbi to generate a tagged image of the default kernel
> and ramdisk in /tftpboot/, so that dependent nodes can
> download it while booting
> In addition, changes will have to be made to the ramdisk, which means
> changes to the mkinitrd.ssi script:
> - Copy /etc/ssitab into the ramdisk.
> - Enhance /linuxrc to match a local MAC address to an entry in
> /etc/ssitab to determine the local IP address and node number.
> - If the local boot flag is set, then /linuxrc compares the default
> kernel and ramdisk on the shared root to those on the local disk.
> If they differ, it runs lilo.ssi with a special flag to just sync
> the local disk.
> - The hack in VI.3 of the installation instructions will go away.
> Dave Zafman and I cooked up a scheme for /linuxrc to read
> /proc/partitions and make all the devices it finds there.
> That removes the need for the sysadmin to figure out the local
> device names of the two GFS partitions.
> - As well as building the ramdisk, mkinitrd.ssi also runs
> mkdhcpd.ssi, since the sysadmin likely changed /etc/ssitab.
> Adding new nodes -- this is the beautiful part:
> - Make sure there are enough available journals for the new nodes
> on the GFS shared root. Note that the Cluster Filesystem (CFS)
> that Dave is porting doesn't have this requirement, which makes
> it better suited for large clusters.
> - Edit /etc/ssitab to add records for each new node. The MAC
> address can be determined by booting the new node with an
> Etherboot floppy or ROM image. Although the DHCP server will
> not respond to this unknown MAC address just yet, the node will
> display on its console the MAC address of the card it discovered.
> - Run mkinitrd.ssi to rebuild the SSI ramdisk and /etc/dhcpd.conf.
> - Run lilo.ssi to distribute the new ramdisk to all nodes that are
> up with the local boot flag set, and to rebuild the tagged image
> in /tftpboot/.
> - If a new node does not have the local boot flag set, just boot it
> with the appropriate Etherboot/PXE ROM image or floppy. Like
> it'll join the cluster.
> - If the local boot flag is set, and the platform is x86, boot it
> with the ROM image or floppy. While running /linuxrc, it'll sync
> the local disk if the boot partition has already been created.
> - If the boot partition has not been created, /linuxrc will proceed
> with joining the cluster. Once it has joined, run fdisk and mkfs
> to set up the boot partition. Then reboot the node one more time
> with the ROM image or floppy, so it can sync the local disk the
> next time it joins.
> - On a platform that does not support Etherboot/PXE, the PITA
> is a bit higher for adding a new node (which must have the
> local boot flag set). To avoid needless installation of the base
> OS, try booting off a distribution CD into rescue mode. Use fdisk
> and mkfs to set up the boot partition. Mount it. Either use a
> floppy or set up networking to copy the default kernel and
> from the cluster to the boot partition. Also, copy the
> stanza for your bootloader (e.g., aboot), and run it to install
> the boot block. Now it's ready to join the cluster. Finally,
> consider adding support for your platform to Etherboot or an
> equivalent software package.
> Some weaknesses in this proposal are support for non-x86 platforms,
> to which I've given some thought, and support for User Mode Linux,
> to which I've given very little thought. There are probably other
> weaknesses, but overall I think this improves the installation and
> management of OpenSSI on the x86 platform.
> Suggestions are definitely welcome, especially since I haven't
> started the implementation, yet. ;)
> Brian Watson | "Now I don't know, but I been told it's
> Linux Kernel Developer | hard to run with the weight of gold,
> Open SSI Clustering Project | Other hand I heard it said, it's
> Compaq Computer Corp | just as hard with the weight of lead."
> Los Angeles, CA | -Robert Hunter, 1970
> mailto:Brian.J.Watson at compaq.com
> ssic-linux-devel mailing list
> ssic-linux-devel at lists.sourceforge.net
Do You Yahoo!?
Find a job, post your resume.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf