[Beowulf] Anyone have experience with Cisco 6509 switches for a cluster?

Brock Palen brockp at umich.edu
Thu Jun 16 10:20:00 EDT 2011


We had a 6509 in one of our older clusters and got rid of it.  Each group of 12 ports has some amount of shared bandwidth so you will see the results you see it is just how it is built.   Though I think you should not see 55MB/s until 6 hosts on a group of 12 ports.

At least in our case if you ran a nasty version of Gromacs (which performs very poorly) but flood the network our 6509 would lock up and we would have to restart every host connected, we had to have  CIsco tech out and after a day of monkeying around a double super secret undocumented command fixed the lockup issue when pushing that many bytes.

Looks like they may have fixed that in newer versions. 
Personally this experience has kept me away from Cisco gear for clusters that don't have another network like IB to take most the load. Though this was now a few years ago the situation with that hardware maybe much better.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Jun 15, 2011, at 10:27 AM, Joe Landman wrote:

> Hi folks
> 
>   A partner is rebuilding a beowulf rendering/post-production cluster, 
> and they swapped out some smaller switches for a larger Cisco 6509 
> switch frame.  They are encountering some issues, asked us for help. 
> Unfortunately, I know very little about these switches and IOS (not the 
> Apple bit), so I am hoping to get a pointer to what we should be looking 
> for.  If anyone is an IOS/Cisco expert that can help today, please 
> contact me offlist.
> 
>   Here's the problem:
> 
>   iperf between 2 nodes, wire speed.  117 MB/s.
> 
>   iperf between 4 nodes, 1/2 wire speed, or 55 MB/s.
> 
>   As they increase the number of pairs doing iperf, performance keeps 
> dropping.
> 
>   This suggests that all traffic is being serialized somehow, possibly 
> transiting a single interface.
> 
>   Anyone out there ever see something like this before?  Any clues as 
> to how to handle it?  Or how to diagnose/fix this?
> 
>   Thanks in advance.
> 
> Joe
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>        http://scalableinformatics.com/sicluster
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Beowulf mailing list