Running 4 MPI Processes on the Localhost
Now that we have a hello MPI program compiled, how do we run it in parallel? All three implementations come with an mpirun program designed to launch MPI applications. In this section, we'll simply launch 4 MPI processes on the localhost (a common debugging/development scenario).
FT-MPI requires starting up a run-time environment (RTE) before launching MPI applications. There are multiple ways to do this; one way is to use the FT-MPI console:
$ console con> add localhost con> spawn -np 4 -mpi hello.ft-mpi
To have LA-MPI's mpirun launch locally, it is easiest to set the LAMPI_LOCAL environment variable to 1 and use the -np switch to request the number of processes to run:
$ export LAMPI_LOCAL=1 $ mpirun -np 4 ./hello.la-mpi
LAM/MPI requires its RTE to be started with the command lamboot before using mpirun. To start the RTE on just the localhost, invoke lamboot with no arguments. mpirun can then be used with the same -np switch to indicate how many MPI processes to launch:
$ lamboot $ mpirun -np 4 hello.lam-mpi
Similar to LAM/MPI, MPICH has a daemon-based RTE, but most MPICH installations still default to the ubiquitous rsh/ssh-based startup mechanisms. In this configuration, MPICH's mpirun will always use a hostfile to specify which hosts to run on. If one is not supplied on the command line, a default file (created when MPICH was installed) will be used. For this example, create a text file named my_hostfile with a single line "localhost" in it. Then use the -machinefile switch to specify your hostfile, along with the -np switch:
$ cat my_mpich_hostfile localhost $ mpirun -machinefile my_mpich_hostfile -np 4 hello.mpich
Running 4 MPI Processes on 2 Dual Processor SMPs
FT-MPI will run across as many hosts are running in in its RTE. If you add multiple hosts, it will launch on both, placing adjacent ranks in MPI_COMM_WORLD on the same node:
con> add node1.example.com con> add node2.example.com con> spawn -np 4 -mpi hello.ft-mpi
The haltall console command shuts down the FT-MPI RTE.
LA-MPI allows the specification of process counts and hosts on the mpirun command line. For example:
$ mpirun -N 2 -H node1.example.com,node2.example.com -n 2,2 hello.la-mpi
The -N switch says to use 2 hosts, -H provides a comma-separated list of hosts, and -n specifies how many processes to start on each host.
LAM/MPI allows flexible specification of process placement via both the hostfile given to lamboot and the mpirun command line.
$ cat my_lam_hostfile node1.example.com cpu=2 node2.example.com cpu=2 $ lamboot my_lam_hostfile $ mpirun C hello.lam-mpi
When you are finished with LAM's RTE, shut it down with the lamhalt command.
Each host is listed once in my_lam_hostfile with a second tag indicating how many CPUs it has (i.e., how many processes LAM should start on that machine). Instead of -np, use C on the mpirun command line, telling LAM to start on "all available CPUs." LAM will automatically place adjacent ranks of MPI_COMM_WORLD be in the same node. This can be ideal, for example, in batch environments where the number of target processes may be variable.
MPICH also requires a hostfile:
$ cat my_mpich_hostfile node1.example.com node2.example.com $ mpirun -machinefile my_mpich_hostfile -np 4 hello.mpich
MPICH will run on each host in the machinefile in round-robin fashion for the number of processes specified by the -np parameter (hosts can be listed more than once to force adjacent ranks in MPI_COMM_WORLD to be on the same node).
Where To Go From Here?
So MPI is MPI is MPI, but not all MPI implementations are created equal. Every MPI implementation is slightly different in minor ways, to even include compiling and running applications. Despair not - even though the differences are typically annoying, they're nothing that users can't figure out with a few minutes perusal of a man page.
If you ran the hello.c program from the last column, you may have noticed that the output order was not as expected. There is no guarantee of output order based on rank (unless specifically programed as part of th operation). As we have seen, process placement can vary from run to run depending upon how your MPI is configured and thus effect output order as well. Parallel input and output will be addressed in a future column.
|MPI Forum (MPI-1 and MPI-2 specifications documents)||http://www.mpi-forum.org|
|MPI - The Complete Reference: Volume 1, The MPI Core (2nd ed) (The MIT Press)||By Marc Snir, Steve Otto,
Steven Huss-Lederman, David Walker,
and Jack Dongarra. ISBN 0-262-69215-5
|MPI - The Complete Reference: Volume 2, The MPI Extensions (The MIT Press)||By William Gropp, Steven Huss-Lederman,
Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg,
William Saphir, and Marc Snir. ISBN 0-262-57123-4.
This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.
Jeff Squyres is the Assistant Director for High Performance Computing for the Open Systems Laboratory at Indiana University and is the one of the lead technical architects of the Open MPI project.
- << Prev