bad job distribution with MPICH

Eduardo Cesar Cabrera Flores eccf at super.unam.mx
Thu Jul 17 17:11:55 EDT 2003



You should try mpiexec

                      
cafe



                                                                  
Hi,                                                                                     
                                                                                        
we're running MPICH 1.2.4 on a 32 node dual cpu linux cluster (fast                     
ethernet), and are having some problems with the mpich job distribution.                
An example from today:                                                                  
                                                                                        
The PBS job:                                                                            
                                                                                        
----------------------------------------                                                
#PBS -l nodes=4:ppn=2,walltime=100:00:00                                                
#                                                                                       
mpirun -np `wc -l < $PBS_NODEFILE` -machinefile $PBS_NODEFILE mfix.exe                  
----------------------------------------                                                
                                                                                        
is assigned to nodes:                                                                   
                                                                                        
        
node17/0+node15/0+node14/0+node11/0+node17/1+node15/1+node14/1+node11/1         
                                                                                        
PBS generates a PBS_NODEFILE containing:                                                
                                                                                        
-----------------------------         
node17/0+node15/0+node14/0+node11/0+node17/1+node15/1+node14/1+node11/1         
                                                                                        
PBS generates a PBS_NODEFILE containing:                                                
                                                                                        
-----------------------------                                                           
node17                                                                                  
node15                                                                                  
node14                                                                                  
node11                                                                                  
node17                                                                                  
node15                                                                                  
node14                                                                                  
node11                                                                                  
-----------------------------                                                           
                                                                                        
And this command is started in node 17:                                                 
                                                                                        
        mpirun -np 8 -machinefile /var/spool/PBS/aux/20996.fire executable              
                                                                                        
And then when I look over the nodes, there's 1 executable running on                    
node17, 3 on node15, 2 on node14 and 2 on node11.                                       
                                                                                        
Anybody seen something like this, and maybe have an idea of what might 




_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list