[Beowulf] motherboards for diskless nodes

Craig Tierney ctierney at hpti.com
Fri Feb 25 18:02:06 EST 2005

On Fri, 2005-02-25 at 15:18, Mark Hahn wrote:
> > > Reasons to run disks for physics work.
> > > 1. Large tmp files and checkpoints.
> > 
> > Good reason, except when a node fails you lose your checkpoints.
> you means s/node/disk/ right?  sure, but doing raid1 on a "diskless"
> node is not insane.  though frankly, if your disk failure rate is 
> that high, I'd probably do something like intermittently store
> checkpoints off-node.

Yes and no.  If the node is down, it is a bit tough for your model
to progress.  Raid1 works well enough in software so that you
don't need additional hardware except the disk.

> > It all ends up being a risk assessment.  We have been up for close
> > to 6 months now.  We have not had a failure of the NFS server.  The
> I have two nothing-installed clusters; on in use for 2+ years,
> the other for about 8 months.  the older one has never had an
> NFS-related problem of any kind (it's a dual-xeon with 2 u160
> channels and 3 disks on each; other than scsi, nothing gold-plated.)
> this cluster started out with 48 dual-xeons and a single 48pt 
> 100bT switch with a gigabit uplink.
> the newer cluster has been noticably less stable, mainly because 
> I've been lazy.  in this cluster, there are 3 racks of 32 dual-opterons 
> (fc2 x86_64) that netboot from a single head node.  each rack has a 
> gigabit switch which is 4x LACP'ed to a "top" switch, which has 
> one measly gigabit to the head/fileserver.  worse yet, the head/FS
> is a dual-opteron (good), but running a crappy old 2.4 ia32 kernel.
> as far as I can tell, you simply have to think a bit about the 
> bandwidths involved.  the first cluster has many nodes connected
> via thin pipes, aggregated through a switch to gigabit
> connecting to decent on-server bandwidth.
> the second cluster has lots more high-bandwidth nodes, connected 
> through 12 incoming gigabits, bottlenecked down to a single 
> connection to the head/file server (which is itself poorly configured).
> one obvious fix to the latter is to move some IO load onto
> a second fileserver, which I've done.  great increase in stability,
> though enough IO from enough nodes can still cause problems.
> shortly I'll have logins, home directories and work/scratch all on 
> separate servers.
> for a more scalable system, I would put a small fileserver in each rack, 
> but still leave the compute nodes nothing-installed.  I know that 
> the folks at RQCHP/Sherbrooke have done something like this, very nicely,
> for their serial farm.  it does mean you have a potentially significant
> number of other servers to manage, but they can be identically configured.
> heck, they could even net-boot and just grab a copy of the compute-node
> filesystems from a central source.  the Sherbrooke solution involves 
> smart automation of the per-rack server for staging user files as well
> (they're specifically trying to support parameterized montecarlo runs.)

Sandia does something similar to this with their CIT toolkit,
but it is still diskless.  For every N nodes, they have an
NFS-redirector.  It boots diskless, and caches all of the files
that the clients read.  The clients hit the redirector, and not
the main filesystem.  

If you do have a disk in these nodes, there are probably some
interesting things you can do with CacheFS when it becomes stable.


Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list