Channel bonding performance problem

Jakob Østergaard jakob at unthought.net
Thu Jul 19 12:30:52 EDT 2001


On Thu, Jul 19, 2001 at 05:17:58AM -0500, Hao He wrote:
> Hi there.
> 
> I just finished some channel bonding test with 2 Dual-3C905-NIC PCs. 
> But the network performance is only a little bit higher than single NIC case.
> I don't know why. Following are the details:
> 
> Node configuration:
> Intel Pentium III
> Dual 3C905 NIC
> Redhat 7.1 (2.4.2)

Is it one dual-port NIC, or two normal 3C905s ?   I don't even know if the former exists,
so maybe this is a stupid question   :)

> 
> When Linux installed, I created bond0 and modified modules.conf and ifcfg-eth0, ifcfg-eth1, ifcfg-bond0
> according the mini-Howto.
> After rebooting the system, I saw that channel bonding was sucessful since by running ifconfig, 
> I found bond0, eth0, eth1 had same IP address and MAC address.

Good.

What about the routes ?  Plase make sure that your routing tables use the bond0 device.

> 
> Then I installed netperf 2.1. I have to removed most default arguments to compile it on my system.

First you did tests without bonding, then you installed bonding and upgraded
the test software ?  That is not a very good way to ensure that your tests can
be compared...


> 
> When I just did local test by netperf, the result looked good. The speed was more than 700Mbps, doubled
> from before. However, when I tested the network performance between the 2 nodes, I found the improvement 
> is quite limited, say, only about 10% (from 90Mbps to 100Mbps).

700 MBps is on the loopback device - it has nothing to do with what your NIC can do.

I suppose the doubling of speed is because of the netperf upgrade.  Or something else.

The network performance *should* roughly double. I am pretty sure that
something is causing the bonding to just not work at all. Then I suppose the
extra 10% you get is because of the same change that caused your doubling to
700Mbps on the local test.

Does this sound reasonable to you ?

> 
> I checked the system by running ifconfig. It seems the bonding worked well because the RX, TX data of the 2 NICs 
> are almost equal which means the communication task are assigned averagely to the 2 cards.

Hmm...  Then I suppose your routing is OK.

> 
> Then I tried to FTP a big file( 115MB) between the two nodes. This time I got a average speed of about 9.2MB/s.
> Seems this speed is doubled then before. So I am confused.

You should be able to get roughly 9 MB/sec on *one* 100Mbps connection.  I don't know why
you only got 4.5 MB/sec before.  Are you sure about this ?

To troubleshoot:

Can you try to ping from one host to the other, using bonding.  Then make sure
that the RX and TX counters increase on both NICs on both machines, and that
the packet loss is 0%.

> 
> I don't know why and how to solve this problem.
> Any suggestion and help from you will be highly appreciated.

Please let me know what you think of my guessing, and post the results from the
ping test on this list.  Then we'll take it from there.

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list