[Beowulf] mpirun issue
reuti at staff.uni-marburg.de
Tue Oct 21 08:53:26 EDT 2008
Am 21.10.2008 um 01:18 schrieb Luis Alejandro Del Castillo Riley:
> hi fellows i have a cluster with 1 master 10 nodes with intel Xeon
> Quad core.
> Fedora core 6
> PGI 7.0-7
> mpich 18.104.22.168
the last version of MPICH from 2005 is 1.2.7p1. For newer
installations I would suggest to look into Open MPI.
> machines.x86_64 with a 10 node names
Means only the 10 nodes?
> when i try to run:
> mpirun -v -arch x86_64 -keep_pg -nolocal -np 9 mm5.mpp
> i had no error but when a run with
> mpirun -v -arch x86_64 -keep_pg -nolocal -np 10 mm5.mpp
> they take around 40 min to send me and error :
> bm_list_4667: (1526.781250) wakeup_slave: unable to interrupt slave
> 0 pid 4666
With so many time, I would suggest to login to all nodes and check with:
$ ps -e f
(f w/o -) the ditribution and startup of the porcesses. Is it doing
nothing for 40 minutes or running fine until it crashes?
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf