bad job distribution with MPICH

Gary Stiehr stiehr at admiral.umsl.edu
Fri Jul 18 17:18:26 EDT 2003


Hi,

Try to use "mpirun -nolocal -np ....".  I think if you don't specify the 
"-nolocal" option, the job will start one process on node17 and then 
that process will start the other 7 processes on the remaining 6 
processors not in node17; thus resulting in three processes on node15.  
Apparently if you use -nolocal, it will use all of the processors.  I'm 
not sure why this is, however, adding "-nolocal" to the mpirun command 
may help you.

HTH,
Gary

Jan-Frode Myklebust wrote:

>Hi, 
>
>we're running MPICH 1.2.4 on a 32 node dual cpu linux cluster (fast
>ethernet), and are having some problems with the mpich job distribution. 
>An example from today:
>
>The PBS job:
>
>----------------------------------------
>#PBS -l nodes=4:ppn=2,walltime=100:00:00
>#
>mpirun -np `wc -l < $PBS_NODEFILE` -machinefile $PBS_NODEFILE mfix.exe
>----------------------------------------
>
>is assigned to nodes:
>
>	node17/0+node15/0+node14/0+node11/0+node17/1+node15/1+node14/1+node11/1
>
>PBS generates a PBS_NODEFILE containing:
>
>-----------------------------
>node17
>node15
>node14
>node11
>node17
>node15
>node14
>node11
>-----------------------------
>
>And this command is started in node 17:
>
>	mpirun -np 8 -machinefile /var/spool/PBS/aux/20996.fire executable
>
>And then when I look over the nodes, there's 1 executable running on
>node17, 3 on node15, 2 on node14 and 2 on node11.
>
>Anybody seen something like this, and maybe have an idea of what might 
>be causing it?
>
>
>  -jf
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list