Dell 1600SC + 82540EM poor performance..HELP NEEDED

Thu Jul 24 11:42:04 EDT 2003

On Thu, 24 Jul 2003 Stephane.Martin at imag.fr wrote:

> Hello, 
> 
> For our tests we are connected to a 4108GL (J4865A), we have done all
> necessary checks (maybe we've have forget something very very big ????)
> to ensure the validity of our mesures. The ports have been tested with
> auto neg on, then off and also forced. We have also the same mesures
> when connected to a J4898A. The negociation between the NIcs ans the two
> switches is working.
> 
> When using a tyan motherboard with the 82540EM built-in and using the
> same benchs and switches ans the same procedures (drivers updates and
> compilations from Intel, various benchs, different OS) the results are
> correct (80 to 90Mo).
> 
> All our tests tends to show that dell missed something in the
> integration of the 82540EM in the 1600SC series...if not we'll really
> really appreciate to know what we are missing there cause here we have a
> 150 000 dollars cluster said to be connected with a network gigabit
> having network perfs of three 100 card bonded (in full duplex it's even
> worse !!!!!). If the problem is not rapidly solved the 48 machines will
> be returned....

I'd totally remove the switch from the situation first. See what you can
get back-to-back by directly connecting one node to another first.

While the 4108GL is great for management networks, it is not a high
performance switch. Wait till you fire up all 48 with PMB.

Your bisectional bandwidth is not going to be great, but you should still
be able to hit decent numbers with a limited number of machines. It's 
possible that broadcast and multicast traffic are interfering with your
runs.

So first remove the switch. If you get the performance you are looking for
point-to-point, then you can focus your efforts on the switch. 

Twice i've had 4108GL's that would experience a severe performance hit
when doing any traffic with a certain blade. The first time it was a fast
ethernet blade in slot "C". Any network traffic that hit a port on this
blade was severely degraded. We swapped blades with a different slot and
the problem did not follow the blade. A firmware update solved the issue.

The second time it was with a gig-E blade in slot "F". Again, any network
traffic that hit a port on this blade was severely degraded (similar to
what you're seeing now). This time, a firmware update did not fix it, but
swapping it with another gig-E blade from another 4108GL worked fine. The
"problem" blade also worked fine in the other 4108.

Targeting Pallas PMB to run on specific nodes based on the topology of the 
switch can sure tell one a lot about a switch...:)

Good luck,

-- 
Rocky McGaugh
Atipa Technologies
rocky at atipatechnologies.com
rmcgaugh at atipa.com
1-785-841-9513 x3110
http://67.8450073/
perl -e 'print unpack(u, ".=W=W+F%T:7\!A+F-O;0H`");'

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf