|
Page 1 of 2
There once was a man from Nantucket
Whose PVM code kicked the bucket
He ranted and raved
"Oh what can I save?"
Then he re-wrote his application MPI and used MPI_COMM_SPAWN and his life became fundamentally better.
The Story So Far
One of the big additions in MPI-2 is the concept of dynamic
processes. However, early uses of it were rather mundane and, truth be
told, unnecessary. Indeed, dynamic processes were added to MPI, at
least in part, as a political necessity since one of the more
important parallel run-time systems prior to MPI included the ability
to spawn new processes at run-time. Let's take a trip back in history
to examine one of MPI's predecessors, the Parallel Virtual Machine
(PVM)...
PVM
PVM was a project out of the Oak Ridge National Laboratory (ORNL) in
the early 90's and was one of the first truly portable parallel
run-time systems. PVM allowed scientists and engineers to develop
parallel codes on their workstations and then run them on "big iron"
production machines. PVM became enormously popular and enjoyed a
large, fervent user base.
Although PVM had the capability to launch multiple processes
simultaneously and bind them together into a job, most users tended to
prefer a different model for launching their parallel
applications. They would simply launch a single, serial process
(e.g., ./a.out) and use PVM's spawning capabilities to launch
the rest of the processes required for the parallel
application. Indeed, this became the de facto method of
launching parallel applications in PVM.
MPI's design was strongly influenced by PVM (among others). However,
for a variety of technical reasons, the ability to spawn new processes
was left out of the initial MPI specification (MPI-1). Although the
MPI-1 standard does not specify how to start a parallel job, most
implementations launch a set of processes together using an
implementation-dependent mechanism, frequently a command
named mpirun. This set of processes comprises the
MPI_COMM_WORLD communicator. The size and composition
of MPI_COMM_WORLD is fixed upon initiation: no processes can
be added to or removed from MPI_COMM_WORLD.
Sidebar:
MPI-2
|
In 1994, the MPI Forum re-convened to add on to the MPI-1
standard. Several large topics were proposed for inclusion: parallel
I/O, new language bindings, one-sided operations, and dynamic
processes (to include spawning).
Although strong technical cases were not initially presented as
to why dynamic processes needed to be included in the MPI-2
standard, it was seen as a political necessity to address the PVM
community's concerns. In typical MPI fashion, the MPI-2 standard
includes not only spawning, but a total of three different
models for dynamic process management (three is better than one,
right?).
Initial implementations of the MPI-2 dynamic process control models
started appearing around 1997. First uses of it were pretty much a
direct port of the PVM start-one-process-that-starts-all-the-rest
model. These were mainly comprised of PVM users porting their
applications to MPI in order to take advantage of low latency networks
or utilize vendor-tuned MPI implementations. It was only within the
last few years that more interesting uses have become possible through
mature, thread-safe implementations of the MPI dynamic process models.
|
The PVM community scoffed at this aspect of MPI-1 - why should a
parallel application be limited in the number of processes that it
could have?
Even though the vast majority of PVM applications only used spawning
capabilities to launch their initial job, and even though MPI
implementations could support parallel applications as large as PVM
(if not larger), this misconception on the part of many PVM users
slowed the initial adoption of MPI. Ironically, the startup mechanism
in MPI is simpler than PVM's
launch-one-process-that-launches-all-the-rest model. Specifically, the
typical PVM model requires that the user write the spawning code. MPI
implementations' built-in mpirun commands (or equivalent)
handled most of the same functionality.
These facts were lost in the Great MPI/PVM Religious Debates of the
early- and mid-90's.
Admittedly, I'm presenting the MPI view of most of the arguments. But
the fact remains that MPI was built upon the shoulders of PVM and used
many of its good ideas (indeed, the PVM developers were on the MPI
Forum). Spawning simply was [initially] not one of them. Three
different dynamic process models were later added in the MPI-2
standard.
Spawning New Processes
The first model is, unsurprisingly, spawning new processes. Keep in
mind, however, that MPI-2 was intended to be extensions to
MPI-1 - not changes. So if the static model of a
fixed MPI_COMM_WORLD remains, what does spawning new
processes mean in MPI?
In short, it means launching another MPI_COMM_WORLD. Spawning
is a collective action, meaning the processes in a
communicator must unanimously decide to launch a new set of
processes. That is, they all invoke the
function MPI_COMM_SPAWN (or MPI_COMM_SPAWN_MULTIPLE)
and instruct MPI to launch a new MPI job that has its
own MPI_COMM_WORLD.
Listing 1:
Sample spawn
1 int rank, err[4];
2 MPI_Comm children;
3 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
4 MPI_Comm_spawn("child", NULL, 4, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &children, err);
5 if (0 == rank) {
6 MPI_Send(&rank, 1, MPI_INT, 0, 0, children);
7 }
|
The code snippet in Listing 1 launches
four copies of an executable named "child" collectively
across the processes in the spawning
job's MPI_COMM_WORLD. This action creates a new MPI
job with its own MPI_COMMW_WORLD, containing four
processes. That is, at the end of MPI_COMM_SPAWN, there will
be two MPI_COMM_WORLD instances - one per job. Each
will have ranks 0 through (number of processes - 1).
Communication is established between the two jobs through
an intercommunicator - the children argument
to MPI_COMM_SPAWN, above. An intercommunicator is similar to
an intracommunicator (i.e., a "normal" communicator, such
as MPI_COMM_WORLD) except that it contains two "groups." In
this case, one group is the spawning process; the other group is the
spawned process. When using intercommunicators, the peer argument of
all communication calls is always expressed in terms of
the other group. Hence, line 6 in Listing 1 is sending to rank 0 of
the children's group.
Listing 2:
Sample spawned child
1 #include "mpi.h"
2 int main(int argc, &argv) {
3 int rank, msg;
4 MPI_Comm parent;
5 MPI_Init(&argc, &argv);
6 MPI_Comm_get_parent(&parent);
7 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
8 if (0 == rank) {
9 MPI_Recv(&msg, 1, MPI_INT, 0, 0, parent, MPI_STATUS_IGNORE);
10 }
|
The newly-spawned application can call MPI_COMM_GET_PARENT to
obtain this communicator (see Listing 2). Again, since
communication with intercommunicators is expressed in terms of the
other group, the peer argument given to MPI_RECV on line 9
in Listing 2 is rank 0 of
the parent's group. Hence, it is receiving the message sent
from the MPI_SEND on line 6 in Listing 1.
Some applications are flexible in that they may be run directly (e.g.,
via mpirun) or they may be spawned. If an application was
spawned, a valid communicator will be returned
from MPI_COMM_GET_PARENT. If it was
not, MPI_COMM_NULL will be returned (i.e., there is no parent
because it was not spawned).
The MPI_COMM_SPAWN_MULTIPLE function behaves the same
as MPI_COMM_SPAWN, except that it allows launching an array
of different executables and command line arguments in a
single MPI_COMM_WORLD - a multiple process, multiple data
(MPMD) style of launching.
|