Patrick Begou Patrick.Begou at hmg.inpg.fr
Fri Oct 10 13:55:43 EDT 2003


I'm new on this list so, just 2 lines about me:
A small linux beowulf cluster (10 nodes) for computational fluids
dynamics in
south-est of France (National Polytechnique Institute from Grenoble) .

I've just updated my cluster (from AMD1500+/ Eth100BT to P4 2.8G +
Gigabit ethernet) and I've updated my system to Red-Hat 7.3, Kernel
2.4.20-20-7. The current version of pvm is pvm-3.4.4-2 from the RedHat
7.3. The previous system was RH7.1.

Since this update I'm unable to start PVM from a node to another (with
the add command).
The console hang for several tenth of seconds then says OK.
The pvmd3 is started on the remote node but the conf command do not show
the additionnal node and I get these errors in the /tmp/pvml.xx file:

[t80040000] 10/10 15:58:31 craya.hmg.inpg.fr (xxx.xxx.xxx.xxx:32772)
LINUX 3.4.4
[t80040000] 10/10 15:58:31 ready Fri Oct 10 15:58:31 2003
[t80040000] 10/10 16:01:46 netoutput() timed out sending to craya02
after 14, 190.000000
[t80040000] 10/10 16:01:46  hd_dump() ref 1 t 0x80000 n "craya02" a ""
ar "LINUX" dsig 0x408841
[t80040000] 10/10 16:01:46            lo "" so "" dx "" ep "" bx "" wd
"" sp 1000
[t80040000] 10/10 16:01:46            sa mtu 4080 f
0x0 e 0 txq 1
[t80040000] 10/10 16:01:46            tx 2 rx 1 rtt 1.000000 id "(null)"

rsh and rexec are working (from master to nodes, from nodes to master
and from nodes to nodes). The transfert speed is near 600Mbits/s on the
network (binary ftp on /dev/null)

variables are set:

I've tried so manythings since thes last 3 days:

- trying to compile install pvm3.4.4.tgz from sources file
- uninstall iptables, ipchains and iplock.
- remove /etc/security (to test this with root authority)
- added .rhosts and hosts.equiv file
- on the master eth0 is 100Mbits toward internet and eth1 is GB towards
the nodes.
I've tried the oposite config: eth0 become GB and eth1 100BT.

Always the same problem!

The cluster is down and I do not know where looking for a solution

If some one could help me solving this problem

Thanks for your help

