Cisco switches for lam mpi
douglas at shore.net
Wed Jul 30 16:44:58 EDT 2003
From: "Jack Douglas" <jd89313 at hotmail.com
To: beowulf at beowulf.org
Subject: Cisco switches for lam mpi
Date: Tue, 29 Jul 2003 16:37:37 +0000
I wonder if someone can help me
We have just installed a 32 Node Dual Xeon Cluster, with a Cisco
4003 Chassis with 48 1000Base-t ports.
We are running LAM MPI over gigabit, but we seem to be experiencing
bottlenecks within the switch
Typically, using the cisco, we only see CPU utilisation of around
Howver, we experimented with a Foundry Switch, and were seeing cpu
utilisation on the same job of around 80 - 90%.
We know that there are commands to "open" the cisco, but the ones we
been advised dont seem to do the trick.
Was the cisco a bad idea? If so can someone recommend a good Gigabit
for MPI? I have heard HP Procurves are supposed to be pretty good.
Or does anyone know any other commands that will open the Cisco switch
further getting the performance up
Have you run Pallas' MPI benchmarks
(http://www.pallas.com/e/products/pmb/) to quantify the differences
between the two switches? The dramatic difference in system performance
suggests you have something going wrong there. You should test under no
load and under load. The difference may be illuminating.
I'd start with an assumption you may have something wrong on the Cisco.
And I'd call whomever you bought it form to come show otherwise.
Make certain you check your counters on the switch (and a few systems)
to see if you have collisions, overruns or any other issues. As noted on
this list before, the Cisco's can have pathological problems with
auto-negotiation. You should be certain to set the ports to Full Duplex
to get the speed up. With GigE, Jumbo Frames increases performance by a
bit. Depending on your set up, I'd also turn off spanning tree,
eliminate any ACLs, SNMP counters etc. which may be on the switch and
contributing to load.
Worst case would be being backplane constrained - you have 32 GigE
nodes. The Supervisor Engine in the Cisco is listed as a 24-Gbps
forwarding engine (18 million packets/sec) at peak. The Foundry NetIron
400 & 800 backplane is 32Gbps + and they say 90mpps peak. Notice the
math to convert between packets and backplane speed doesn't work. My
experience is that the Foundry is always faster and has lower latency.
I have little experience with the HP pro curve switches. I've used them
in data closets where backplane speed is not an issue. They've been
reliable, but I've never considered them for a high speed network core.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf