[Beowulf] comparing MPI HPC interconnects: manageability?

Thu Feb 19 15:19:56 EST 2004

Hi all,

While performance (latency, bandwidth) usually comes to the fore in
discussions about high performance interconnects for MPI clusters, I'm
curious as to what your experiences are from the standpoint of
manageability -- NIC's and spines and switches all fail at one time or
another, but I'd like input as to how individual products (Myrinet,
Quadrics, Infiniband, etc) handle this.  In your clusters does the
hardware replacement involve simple steps (swap out the NIC, rerun some
config utilities) or something more complex (such as bringing down the
entire high speed network to reconfigure it so all the nodes can talk to
the new hardware); i.e., How painful is it to replace a single failed NIC?

I'd imagine that most cluster admins are reluctant to interrupt running
jobs in order to re-initialize the equipment after hardware replacement.
Any information about how your clusters running high-speed interconnects
handle interconnect hardware failure/replacement would be very helpful.

Thanks,

Dave Stirling
Brigham Young University

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf