Channel-bonding/VLAN with Scyld

Mike Weller weller at
Mon Jul 9 19:56:53 EDT 2001


I sent an email to the list last week regarding configuring our HP
Procurve 4000m switch for channel-bonding.  I am still having major

If the OS was configured for channel-bonding without any switch
configuration, I got only 17Mbps :-( When I turned on trunking for
certain ports on the switch, I got about 100Mbps, which was only a
slight improvement from 1 NIC (plus, it was using SA/DA, so there was
no node-to-node bandwidth improvement).

I got a response suggesting that I should setup 2 VLANs, and have all
eth0's on VLAN-1 and all eth1's on VLAN-2.  The responder said that he
can get 190Mbps.  The trunking configuration is supposed to be for
switch to switch configurations.  He was using an HP 2400 (or was it
2424?).  The manual I have is for both models, so I assume that it
will work with mine as well.

I am still having major difficulties with getting it to work with

I telnetted to my switch, and added VLAN1 and VLAN2.  All slave eth0's
were set to VLAN1 and eth1's to VLAN2.  (Note: Master has 3 NICs, so
eth1 was put on VLAN1 and eth2 on VLAN2).  I did not configure any
overlap between the VLANs.

I had to temporarily disable channel-bonding on the master to get
the slave to boot:

 /etc/init.d/beowulf stop ; ifconfig bond0 down ; ifconfig eth2 down ; 
 ifconfig eth1 down ; ifconfig bond0 inet netmask ; 
 ifconfig eth1 inet netmask ; 
 ifenslave bond0 eth1 ; /etc/init.d/beowulf start

After doing so, the slave node was able to boot up off of ETH0, by
grabbing the image from master's ETH1.

Now the question is, how am I supposed to channel-bond the nodes
after this point?

When ALL NICs were part of the same VLAN, these scripts used to work:

SLAVE:  (no idea if there's a better way)
#!/bin/csh -f
set node=$1
modprobe --node $node bonding
cat <<EOF > /tmp/runme
 ifconfig eth0 inet `bpstat -a $node` netmask
 ifconfig bond0 inet `bpstat -a $node` netmask
 ifenslave bond0 eth0
 ifenslave bond0 eth1
bpcp /tmp/runme ${node}:/tmp
bpsh $node nohup csh -f /tmp/runme 

ifenslave bond0 eth2

Now that the NICs are on 2 distinct VLANs, the SLAVE script "hangs",
which makes sense because its eth1 interface is transmitting half of
the packets onto an isolated VLAN that MASTER's eth2 was not
configured for yet.  When I ran the MASTER line immediately after
that, it did not remedy the problem.

>From the master (, I could still ping the slave (
The TCP/IP connections could not be established for some reason.
I purposely started an FTP server on the slave so that I can test
forming TCP/IP connections afterward.  ftp'ing to the slave just
hung, although I could sniff 2-way packets from to (bpmaster's port).  I'm guessing that the connection
was spoiled since the slave put up this message:

bproc: connect: connect failed, errno=111
bpslave: short read - lost connection to master
  rebooting in 30 seconds

"ifconfig -a" on the master and slave showed that they were properly
bonded.  I also tricked the slave into giving me a shell so that I
could type stuff after it lost connection with the master. I did this
by "bpsh 0 csh -f /tmp/shell"
/tmp/shell looks like:
tcsh < /dev/console > /dev/console

>From the slave, I was able to still ping the master, and "ifconfig -a"
showed that it was properly bonded.  Of course, it rebooted me in
30 seconds :-(

At this point, I'll buy a whole new switch if it makes my life easier!
Any ideas?


Michael J. Weller, M.Sc.               office: (972) 235-7881 x.242
weller at                         cell: (214) 616-6340
Zyvex Corp., 1321 N Plano           facsimile: (972) 235-7882    
Richardson, TX 75081                      icq: 6180540

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list