Running the MPI Application with LAM/MPI
Before parallel MPI programs can be run, the LAM/MPI run-time environment must be started (or "booted") with the lamboot command. For simplicity's sake, this example assumes running on a traditional rsh/ssh, Beowulf-style cluster where the user can login to all nodes without needing to interactively provide a password or passphrase. If you are required to enter a password, then you will have difficulty with the following steps.lamboot expects an argument specifying the name of a boot schema file (or "hostfile") indicating on which nodes to launch the run-time environment. The simplest boot schema file is a text file with one hostname (or IP address) per line. Consider the following boot schema file (named "myhosts"):
node1.example.com node2.example.com node3.example.com node4.example.com
Run the lamboot command with myhosts as an argument:
$ lamboot myhosts
When lamboot completes successfully, the LAM run-time environment has been booted on all the nodes listed in myhosts and is available to run MPI programs. The mpirun command is used to launch MPI applications. The C switch is used to tell LAM to launch one MPI process per "CPU" (as indicated in the boot schema file; if no CPU count is indicated -- as in this example -- one CPU per node is assumed) in the run-time environment. For example, the following launches four MPI "hello" processes:
$ mpirun C hello Hello, world. I am 0 of 4. Hello, world. I am 1 of 4. Hello, world. I am 2 of 4. Hello, world. I am 3 of 4.
Both mpirun and lamboot support many more command line features, options, and modes of operation; be sure to see their respective manual pages for more details.
Running the MPI Application with MPICH
Running a program with MPICH is not quite as involved as LAM/MPI. Check that your execution path and environment variables are set correctly for your version of MPICH. To run the program under MPICH, you will need a machine file which looks striking familiar to the LAM schema file:
node1.example.com node2.example.com node3.example.com node4.example.com
Again, you must be able to login to the nodes in your machine file without entering a password. To run the MPICH complied program, simply create a machine file called machines with the names of the machines in your cluster then execute the following.
mpirun -np 2 -machinefile machines hello
You should see a similar, but not necessarily the same output order as the LAM example. You may also see something funny with the output in any case. We will talk about this and some other runtime issues next month.
Sidebar One: Should you parallelize your application |
Before parallelizing your application, it is advisable to perform some
analysis to see if it will benefit from parallelization. Generally
speaking, the point of parallelizing an application is to make it run
faster. Hence, it only makes sense to parallelize an application if:
If neither of the above conditions are met, the overhead added by parallelization may actually cause the application to run slower. For example, there is no point in parallelizing an application that only takes a few seconds to run. However, an application that takes several hours to run in serial and can easily have its work divided into multiple, semi-independent parts is probably a good candidate for parallelization. |
Sidebar Two: MPI function notation | ||||||||||
The MPI standard defines MPI all functions in three languages: C, C++,
and Fortran. Every function defined by the MPI standard will
necessarily have a different binding in each language. For example,
the bindings for the MPI initialization function are listed in the
"MPI Initialization Function Language Bindings" table.
To refer to the MPI function without referring to a specific language binding, the MPI standard uses all capitol letters: MPI_INIT. |
Sidebar Three: MPI processes vs. ranks |
Many MPI programmers tend to refer to MPI processes as "ranks". This
is not technically correct.
A single rank value may therefore refer to multiple different MPI processes. For example, it is not correct to say "send to rank 0". It is more correct to say "send to MPI_COMM_WORLD rank 0". Unfortunately, even [communicator, rank] pairs are not always unique. It is easy to imagine cases where multiple communicators containing disjoint sets of processes are referred to by the same variable. Consider a communicator referred to by a variable named row: "row rank 0" therefore does not necessarily refer to a unique MPI process. In this case, it is typically safer to refer to the MPI process through its MPI_COMM_WORLD rank. But the situation becomes even more complicated when we introduce the concept of MPI-2 dynamic processes -- where it is possible to have multiple, simultaneous instances of MPI_COMM_WORLD with disjoint sets of processes. Dynamic MPI processes -- and the issues surrounding them -- will be explored in a future edition of this column. |
Sidebar Four: MPI Resources |
MPI Forum
MPI -- The Complete Reference: Volume 1, The MPI Core (2nd ed) (The MIT Press) by Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. ISBN 0-262-69215-5. MPI -- The Complete Reference: Volume 2, The MPI Extensions (The MIT Press) by William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. ISBN 0-262-57123-4. |
This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.
Jeff Squyres is a research associate at Indiana University and is the lead developer for the LAM implementation of MPI.