Cisco switches for lam mpi

Douglas O'Flaherty douglas at shore.net
Wed Jul 30 16:44:58 EDT 2003


    From: "Jack Douglas" <jd89313 at hotmail.com
    <http://www.mail2web.com/cgi-bin/compose.asp?mb=&mp=P&mps=0&lid=0&intListPerPage=20&messageto=jd89313@hotmail.com&ed=0GiZyQ7mCUTaOfqbPc0PCcw5ipKw5gh%2Bk8e2sQ0iJ0kppFsWke4Syd%2Bg3IwaIWhXCYEvHrvg9CjF%0D%0AWN0oWsv6zTP7GUytPsTeOHpoiRk6sRGQsanK5As%3D>>

    To: beowulf at beowulf.org
    <http://www.mail2web.com/cgi-bin/compose.asp?mb=&mp=P&mps=0&lid=0&intListPerPage=20&messageto=beowulf@beowulf.org&ed=0GiZyQ7mCUTaOfqbPc0PCcw5ipKw5gh%2Bk8e2sQ0iJ0kppFsWke4Syd%2Bg3IwaIWhXCYEvHrvg9CjF%0D%0AWN0oWsv6zTP7GUytPsTeOHpoiRk6sRGQsanK5As%3D>

    Subject: Cisco switches for lam mpi
    Date: Tue, 29 Jul 2003 16:37:37 +0000 

    Hi

    I wonder if someone can help me

    We have just installed a 32 Node Dual Xeon Cluster, with a Cisco
    Cataslyst
    4003 Chassis with 48 1000Base-t ports.

    We are running LAM MPI over gigabit, but we seem to be experiencing
    bottlenecks within the switch

    Typically, using the cisco, we only see CPU utilisation of around
    30-40%

    Howver, we experimented with a Foundry Switch, and were seeing cpu
    utilisation on the same job of around 80 - 90%.

    We know that there are commands to "open" the cisco, but the ones we
    have
    been advised dont seem to do the trick.

    Was the cisco a bad idea? If so can someone recommend a good Gigabit
    switch
    for MPI? I have heard HP Procurves are supposed to be pretty good.

    Or does anyone know any other commands that will open the Cisco switch
    further getting the performance up

    Best Regards

    JD

==============

Jack:

Have you run Pallas' MPI benchmarks 
(http://www.pallas.com/e/products/pmb/) to quantify the differences 
between the two switches? The dramatic difference in system performance 
suggests you have something going wrong there.  You should test under no 
load and under load. The difference may be illuminating.

I'd start with an assumption you may have something wrong on the Cisco. 
And I'd call whomever you bought it form to come show otherwise.

Make certain you check your counters on the switch (and a few systems) 
to see if you have collisions, overruns or any other issues. As noted on 
this list before, the Cisco's can have pathological problems with 
auto-negotiation. You should be certain to set the ports to Full Duplex 
to get the speed up. With GigE, Jumbo Frames increases performance by a 
bit. Depending on your set up, I'd also turn off spanning tree, 
eliminate any ACLs, SNMP counters etc. which may be on the switch and 
contributing to load.

Worst case would be being backplane constrained - you have 32 GigE 
nodes. The Supervisor Engine  in the Cisco is listed as a 24-Gbps 
forwarding engine (18 million packets/sec) at peak. The Foundry NetIron 
400 & 800 backplane is 32Gbps + and they say 90mpps peak. Notice the 
math to convert between packets and backplane speed doesn't work.  My 
experience is that the Foundry is always faster and has lower latency. 

I have little experience with the HP pro curve switches. I've used them 
in data closets where backplane speed is not an issue. They've been 
reliable, but I've never considered them for a high speed network core.

doug

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list