[Beowulf] comparing MPI HPC interconnects: manageability?

Dan Kidger daniel.kidger at quadrics.com
Thu Feb 19 18:13:20 EST 2004


> While performance (latency, bandwidth) usually comes to the fore in
> discussions about high performance interconnects for MPI clusters, I'm
> curious as to what your experiences are from the standpoint of
> manageability -- NIC's and spines and switches all fail at one time or
> another, but I'd like input as to how individual products (Myrinet,
> Quadrics, Infiniband, etc) handle this.  In your clusters does the
> hardware replacement involve simple steps (swap out the NIC, rerun some
> config utilities) or something more complex (such as bringing down the
> entire high speed network to reconfigure it so all the nodes can talk to
> the new hardware); i.e., How painful is it to replace a single failed NIC?
> I'd imagine that most cluster admins are reluctant to interrupt running
> jobs in order to re-initialize the equipment after hardware replacement.
> Any information about how your clusters running high-speed interconnects
> handle interconnect hardware failure/replacement would be very helpful.

AFAIK all interconnects would allow the swap of a NIC without bringing down 
the whole network - but in all cases any parallel job running on that node would 
need to be aborted since in general high-speed interconect PCI cards are not
hot-swappable - that node woudl need to be power-cycled.

As for the cables and switches, I can't speak for other vendors - but for example a
line card in a Quadrics Switch can be hot-swapped even while there are running 
MPI jobs that are sending data through that line card at the time - the jobs simply 
pause until the cables are reconnected. I would expect that other interconnects
are the same in this respect?


Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list