[Beowulf] HPC fault tolerance using virtualization
jmdavis1 at vcu.edu
Tue Jun 16 15:06:39 EDT 2009
John Hearns wrote:
> 2009/6/16 Egan Ford <egan at sense.net <mailto:egan at sense.net>>
> I have no idea the state of VMs on IB. That can be an issue with
> MPI. Believe it or not, but most HPC sites do not use MPI. They
> are all batch systems where storage I/O is the bottleneck.
> Burn the Witch! Burn the Witch!
> Any HPC installation, if you want to show it off to alumni, august
> committees from grant awarding bodies etc. and not get sand kicked in
> your face from the big boys in the Top 500 NEEDS an expensive
> infrastructure of various MPI libraries. Big, big switches with lots
> of flashing lights. Highly paid, pampered systems admins who must be
> treated like expensive racehorses, and not exercised too much every
> day. They need cool beers on tap and luxurious offices to relax in
> while they prepare to do that vital half hours work per day which
> keeps your Supercomputer flashing away and making noises.
I realize that this is humor, but one must remember just how sensitive
System Admins can be before making such statements. I would like to
refer you to the BOFH (Bastard Operator from Hell) or as I like to call
it the SysAdmins guide to interpersonal relationships. Remember what
these people do and more importantly what they can do.
On a serious note, who else get's out of bed at 3 am because an
automated system indicates an issue with an HPC research cluster, or the
Computing Center Calls because fresh water has been cut off and the
building is warming, or you get the call that the water pumps (dual for
redundancy but sharing one controller, now that's engineering) have
failed, or that machine room power is dirty because 1/2 of the battery
bank has shorted and the other half can't supply all of the needed clean
power etc, etc.
In my experience, Sysadmins don't want beer or luxurious offices they
want the tools that they need, proper managerial support, and respect.
Mike Davis Technical Director
(804) 828-3885 Center for High Performance Computing
jmdavis1 at vcu.edu Virginia Commonwealth University
"Never tell people how to do things. Tell them what to do and they will surprise you with their ingenuity." George S. Patton
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf