[Beowulf] gigabit ethernet: horrible performance for 0 byte messages
Gerry Creager N5JXS
gerry.creager at tamu.edu
Wed Feb 11 23:13:12 EST 2004
Realize that not all switches are created equal when working with small
(and, overall, 0-byte == small) packets. A number of otherwise decent
network switches are less than stellar performers with small packets.
We've evaluated this in my lab with an Anritsu MD-1230 Ethernet test
system running under the RFC-2544 testing suite...
There are switches that perform well with small packets, but it's been
our experience that most switches, especially your lower cost switches
(Cisco 2900/2950/3500, 4000/4500; Allied Telesyn *; Cabletron *; some
others I can't recall right now) didn't perform well with smaller
packets but did fine when the packet size was about 1500 bytes.
Going with cheap switches is usually not a good way to improve performance.
gerry
Douglas Eadline, Cluster World Magazine wrote:
> On Wed, 11 Feb 2004, Bernhard Wegner wrote:
>
>
>>Hello,
>>
>>I have a really small "cluster" of 4 PC's which are connected by a normal
>>Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board
>>I thought I might be able to improve performance by connecting the machines
>>via a Gigabit switch (which are really cheap nowadays).
>>
>>Everything seemed to work fine. The switch indicates 1000Mbit connections to
>>the PC's and transfer rate for scp-ing large files is significantly higher
>>now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than
>>with the 100 Mbit switch.
>>
>>I wasn't able to actually track down the problem, but it seems that there is
>>a problem with small messages. When I run the performance test provided with
>>mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0
>>byte message length, while for larger messages everything looks fine (linear
>>dependancy of transfer time on message length, everything below 300 us). I
>>have also tried mpich2 which shows exactly the same behavior.
>>
>>Does anyone have any idea?
>
>
> First, I assume you were running the 100BT through the same
> onboard NICs and got reasonable performance. So some possible
> things:
>
> - the switch is a dog or it is broken
> - your cables may be old or bad (but worked fine for 100BT)
> - negotiation problem
>
> Some things to try:
>
> Use a cross over cable (cat5e) and see if you get the same problem.
> You might try using a lower level benchmark (of the micro variety)
> like netperf and netpipe.
>
> The Beowulf Performance Suite:
> http://www.clusterworld.com/article.pl?sid=03/03/17/1838236
>
> has these tests. Also, the December and January issues of ClusterWorld
> show how to test a network connection using netpipe. At some point this
> content will be showing up on the web-page.
>
> Also, the MPI Link-checker from Microway (www.microway.com)
>
> http://www.clusterworld.com/article.pl?sid=04/02/09/1952250
>
> May help.
>
>
> Doug
>
>
>>Here are the details of my system:
>> - Suse Linux 9.0 (kernel 2.4.21)
>> - mpich-1.2.5.2
>> - motherboard ASUS P4P800
>> - LAN (10/100/1000) on board (3COM 3C940 chipset)
>> - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M
>
> +
>
>> 8x88E1111-BAB, AT89C2051-24PI)
>>
>>
>
>
--
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list