Robert G. Brown
rgb at phy.duke.edu
Wed Nov 8 11:35:54 EST 2000
On Wed, 8 Nov 2000, Jon Tegner wrote:
> Thanks a lot for comments and suggestions!
> > I think that you have a problem with your switch :
> > The maximum aggregate bandwidth available is too low to sustain high
> > troughput transfer between more than 2 nodes. Which switch brand and
> > model do you use for your cluster ?
> I have done some further testing, using netpipe
> (http://www.scl.ameslab.gov/Projects/ClusterCookbook/nprun.html). That
> application sends packages of increasing size between two nodes, and
> there is a drastic reduction in throughput when the size of the block
> approaches around 5800 bytes. For 5780 bytes speed is 67.45 Mbps and
> for 5800 bytes it is 1.34 Mbps. For increasing sizes it stays low (at
> least up till 1e6 bytes (where I stopped)).
> This is just for communication between two nodes, but since the
> communication passes through the switch (D-link, des-3225G) it can
> still be a result of a poorly configured one (no expert in that area),
> but it can(?) still be a problem with the card.
> I'll change to different cards on two of the nodes to check it out.
I've seen somewhat similar behavior on cheap netgear switches (FS108).
See the graphs in the ALSC paper/talk on www.phy.duke.edu/brahma, and
the discussion of latency vs bandwidth bounded communications. Note
also the interesting differences between the netperf-based figure (the
paper) and the bw-tcp based figure (the talk). The latter is more
easily understandable in terms of some sort of e.g. buffer size
boundary, but I haven't yet had time to go into the bw-tcp code to add
buffer size as a parameter (it is a parameter in netperf, but netperf
looks frankly "broken" in some way and is no longer apparently being
actively maintained). I'll try to crank out a similar sweep with
netpipe wrapped in my perl sweep script and generate a related figure.
I too don't know what part of the clearly revealed problem is the
switch, what part is the particular NIC, what part the TCP stack or the
kernel itself. So much complexity, so little time...
If you figure it out, let me know too -- I'd really like to be able to
recover a smooth and understandable transition from latency bounded to
bandwidth bounded behavior (see figure on
www.phy.duke.edu/brahma/brahma.html). I haven't gotten smooth and
predictable behavior like this since 2.2.x was introduced, and my peak
network performance (on the same hardware) has never been the same
either. I have no idea why, but would love to.
> Beowulf mailing list
> Beowulf at beowulf.org
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
Beowulf mailing list
Beowulf at beowulf.org
More information about the Beowulf