help! MPI Calls not responding...

Timothy R. Whitcomb twhitcomb at apl.washington.edu
Thu Jul 10 16:52:50 EDT 2003


We are trying to run the Navy's COAMPS atmospheric model on a Scyld
Beowulf cluster, using the Portland Group FORTRAN compiler.  The
cluster is comprised of five nodes, each with dual AMD processors.

After some modification to the supplied Makefile, the software now
compiles and fully links.  The makefile was modified to use the
following options for the compiler
-----------------------------------------------
"EXTRALIBS= -L/usr/lib -lmpi -lmpich -lpmpich -lbproc -lbpsh -lpvfs
-lbeomap -lbeostat -ldl -llapack -lblas -lparpack_LINUX
-L/usr/coamps3/lib -lfnoc -L/usr/lib/gcc-lib/i386-redhat-linux/2.96 -lg2c"
-----------------------------------------------

However, when we try to run the code using
mpirun -allcpus atmos_forecast.exe
or
mpprun -allcpus atmos_forecast.exe
in a Perl script, it gives the following error:
-----------------------------------------------
Fatal error; unknown error handler
May be MPI call before MPI_INIT.  Error message is MPI_INIT and code is 208
Fatal error; unknown error handler
May be MPI call before MPI_INIT.  Error message is MPI_COMM_RANK and
code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT.  Error message is MPI_COMM_SIZE and
code is 197
NOT ENOUGH COMPUTATIONAL PROCESSES
Fatal error; unknown error handler
May be MPI call before MPI_INIT.  Error message is MPI_ABORT and code is 197
Fatal error; unknown error handler
May be MPI call before MPI_INIT.  Error message is MPI_BARRIER and code is 197
-----------------------------------------------
where the NOT ENOUGH COMPUTATIONAL PROCESSES is a program message that
indicates that you've specified to use more processors than
available.  The offending section of code is
-----------------------------------------------
      call MPI_INIT(ierr_mpi)
      call MPI_COMM_RANK(MPI_COMM_WORLD, npr, ierr_mpi)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, nprtot, ierr_mpi)
-----------------------------------------------

I modified this code to add a call to MPI_INITIALIZED after the
MPI_INIT call which indicated that the MPI_INIT just plain was not
working.

If it makes any difference, I can run the Beowulf demos (like
mpi-mandel or linpack) just fine on the multiple processors.

What is going on here and how do we fix it? We're new to cluster
computing, and this is getting over our heads.  I've tried to supply
the information I thought was relevant but as this project is proving
to me what I think doesn't do me much good.

Thanks in advance...

Tim Whitcomb
twhitcomb at apl.washington.edu
University of Washington Applied Physics Laboratory

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list