[Beowulf] motherboards for diskless nodes

Mark Hahn hahn at physics.mcmaster.ca
Fri Feb 25 17:18:16 EST 2005

> > Reasons to run disks for physics work.
> > 1. Large tmp files and checkpoints.
> Good reason, except when a node fails you lose your checkpoints.

you means s/node/disk/ right?  sure, but doing raid1 on a "diskless"
node is not insane.  though frankly, if your disk failure rate is 
that high, I'd probably do something like intermittently store
checkpoints off-node.

> It all ends up being a risk assessment.  We have been up for close
> to 6 months now.  We have not had a failure of the NFS server.  The

I have two nothing-installed clusters; on in use for 2+ years,
the other for about 8 months.  the older one has never had an
NFS-related problem of any kind (it's a dual-xeon with 2 u160
channels and 3 disks on each; other than scsi, nothing gold-plated.)
this cluster started out with 48 dual-xeons and a single 48pt 
100bT switch with a gigabit uplink.

the newer cluster has been noticably less stable, mainly because 
I've been lazy.  in this cluster, there are 3 racks of 32 dual-opterons 
(fc2 x86_64) that netboot from a single head node.  each rack has a 
gigabit switch which is 4x LACP'ed to a "top" switch, which has 
one measly gigabit to the head/fileserver.  worse yet, the head/FS
is a dual-opteron (good), but running a crappy old 2.4 ia32 kernel.

as far as I can tell, you simply have to think a bit about the 
bandwidths involved.  the first cluster has many nodes connected
via thin pipes, aggregated through a switch to gigabit
connecting to decent on-server bandwidth.

the second cluster has lots more high-bandwidth nodes, connected 
through 12 incoming gigabits, bottlenecked down to a single 
connection to the head/file server (which is itself poorly configured).

one obvious fix to the latter is to move some IO load onto
a second fileserver, which I've done.  great increase in stability,
though enough IO from enough nodes can still cause problems.
shortly I'll have logins, home directories and work/scratch all on 
separate servers.

for a more scalable system, I would put a small fileserver in each rack, 
but still leave the compute nodes nothing-installed.  I know that 
the folks at RQCHP/Sherbrooke have done something like this, very nicely,
for their serial farm.  it does mean you have a potentially significant
number of other servers to manage, but they can be identically configured.
heck, they could even net-boot and just grab a copy of the compute-node
filesystems from a central source.  the Sherbrooke solution involves 
smart automation of the per-rack server for staging user files as well
(they're specifically trying to support parameterized montecarlo runs.)

regards, mark hahn.

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list