Channel bonding: working combinations ?

Martin Siegert siegert at sfu.ca
Tue Jan 23 15:44:46 EST 2001


Hi Daniel,

I had none of your problems when using the DFE570Tx with the tulip
driver (see the other post). I actually never had to use ifenslave/ifconfig
manually, the configuration comes up reliably after rebooting or when
running "/etc/rc.d/init.d/network restart".

Hence I can only guess where your problems may be:
1. I trust that you have the line "alias bond0 bonding" in your 
   /etc/conf.modules (or /etc/modules.conf, whatever you are using) file.
   (sounds stupid, but I made that mistake once).
2. You mentioned that you use eth0 for a different network. Is it using
   the same driver as the other cards? If it is: how do you tell which
   card your machine is recognizing as eth0? This happened to me over
   and over again: if you plug in a second NIC you cannot be sure that
   the new card will be eth1 - it may just as well be eth0 and the old
   card may come up as eth1, creating nothing but problems.
   The only way I found to figure this out is to run ping on the
   network that is connected to eth0 and look which card has flashing
   lights (and then swap cards).

I hope this helps.

Cheers,
Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================

On Mon, Jan 22, 2001 at 08:58:55AM +0100, Pfenniger Daniel wrote:
> 
> I am trying to install channel bonding on our cluster, but I meet a 
> few problems that may interest people on the list. 
> 
> Linux kernel: 2.2.18 or 2.4.0, compiled with gcc 2.95.2, (RedHat 6.2)
> Motherboard: ASUS P2B-D (BX chipset)
> Procs: Pentium II 400 dual
> Ethernet cards: with the tulip chips DS21140 and DS21143. They work well 
>    when not bonded.
> Switches: 2 Foundry FastIron II 
> Drivers: tulip.o, or old_tulip.o as modules supplied with the official kernel
> Documentation: in /usr/src/linux-2.2.18/Documentation/networking/bonding.txt 
>                (BTW this file is not provided in kernel 2.4.0)
> 
> I have strictly followed the indications in bonding.txt
> Every card has a distinct IRQ. 
> 
> The first problem is that ifconfig bond0 does not find any hardware
> or IP address at boot or interactively (they are zero). 
> I can persuade an hw address by giving it manually:
> 
>    ifconfig bond0 192.168.2.64 hw ether 00:40:05:A1:D9:09 up
> 
> Here I don't know how to automatically force the hw address in the 
> ifcfg-bond0 file.
> 
> Incidentally there are a few different versions of ifenslave.c on the net 
> with the same version number  (v0.07 9/9/97  Donald Becker
> (becker at cesdis.gsfc.nasa.gov)).  
> I have taken the version included with the bonding-0.2.tar.gz tarball.
> 
> By manually starting channel bonding I get (eth0 is assigned to another
> network): 
> 
> bond0     Link encap:Ethernet  HWaddr 00:40:05:A1:D9:09  
>           inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>           RX packets:108 errors:38 dropped:0 overruns:0 frame:0
>           TX packets:6 errors:5 dropped:0 overruns:0 carrier:15
>           collisions:0 txqueuelen:0 
> 
> eth1      Link encap:Ethernet  HWaddr 00:40:05:A1:D9:09  
>           inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>           RX packets:108 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:100 
>           Interrupt:18 Base address:0xb800 
> 
> eth2      Link encap:Ethernet  HWaddr 00:40:05:A1:D9:09  
>           inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:38 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:5 dropped:0 overruns:0 carrier:15
>           collisions:0 txqueuelen:100 
>           Interrupt:17 Base address:0xb400 
> 
> Then a ping to another such bonded node may produce different things: 
> - a complete freeze, reset required.
> - ping waits, ctrl-c stops it.
> - ping works, with almost double speed
> 
> When ping works netperf -H node may either be almost twice as fast (175 Mb/s) 
> as single channel communications (94 Mb/s), or much slower (10, 25 Mb/s), 
> despite ping indicating improved communication time.
> 
> In conclusion channel bonding with such a configuration appears unreliable. 
> 
> Since several messages have been posted on this list stating problems, 
> as well as on the tulip list about tulip drivers, with the present channel 
> bonding capability of the Linux kernel, it could be useful if people with 
> working combinations of kernel (is 2.2.17 better), NIC/driver (which tulip
> version), etc, could share their detailed working specs.  
> I am sure this would be much appreciated by those wanting to bond their Beowulf. 

_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list