[Beowulf] gigabit ethernet: horrible performance for 0 byte messages

Douglas Eadline, Cluster World Magazine deadline at linux-mag.com
Wed Feb 11 19:31:43 EST 2004


On Wed, 11 Feb 2004, Bernhard Wegner wrote:

> Hello,
> 
> I have a really small "cluster" of 4 PC's which are connected by a normal 
> Ethernet 100 Mbit switch. Because the motherboards have Gigabit-LAN on board 
> I thought I might be able to improve performance by connecting the machines 
> via a Gigabit switch (which are really cheap nowadays).
> 
> Everything seemed to work fine. The switch indicates 1000Mbit connections to 
> the PC's and transfer rate for scp-ing large files is significantly higher 
> now, but my software unsing mpich RUNS about a factor of 4-5 SLOWER NOW than 
> with the 100 Mbit switch.
> 
> I wasn't able to actually track down the problem, but it seems that there is 
> a problem with small messages. When I run the performance test provided with 
> mpich, it reports (bshort2/bshort4) extremely long times (e.g. 1500 us) for 0 
> byte message length, while for larger messages everything looks fine (linear 
> dependancy of transfer time on message length, everything below 300 us). I 
> have also tried mpich2 which shows exactly the same behavior.
> 
> Does anyone have any idea?

First, I assume you were running the 100BT through the same 
onboard NICs and got reasonable performance. So some possible
things:

- the switch is a dog or it is broken
- your cables may be old or bad (but worked fine for 100BT)
- negotiation problem

Some things to try:

Use a cross over cable (cat5e) and see if you get the same problem.
You might try using a lower level benchmark (of the micro variety)
like netperf and netpipe. 

The Beowulf Performance Suite:
http://www.clusterworld.com/article.pl?sid=03/03/17/1838236

has these tests. Also, the December and January issues of ClusterWorld
show how to test a network connection using netpipe. At some point this 
content will be showing up on the web-page. 

Also, the MPI Link-checker from Microway (www.microway.com)

http://www.clusterworld.com/article.pl?sid=04/02/09/1952250

May help.


Doug

> 
> Here are the details of my system: 
>  - Suse Linux 9.0 (kernel 2.4.21)
>  - mpich-1.2.5.2
>  - motherboard ASUS P4P800
>  - LAN (10/100/1000) on board (3COM 3C940 chipset)
>  - LevelOne 10/100/1000 8-port Fast Ethernet Switch (chipset: TC9208M 
+
>    8x88E1111-BAB, AT89C2051-24PI)
> 
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list