|
Page 1 of 2 Spawnworthy? Who are these people?
I like March - a month of celebration. It's the time when everyone around the
world unites in joy and praise. Songs, dancing, pot-luck dinners,
t-shirts, etc. I'm speaking of Pi Day, of course - on 3/14. In case you missed it this year, don't forget
Pi Approximation Day is fast approaching. In honor of this number, I suggest all HPC
users should contribute to the Pi revelry by computing and reciting as
much of Pi as possible. What better way to do this than to optimize
approximate computations of Pi in parallel? Hint: the first three digits are in this paragraph.
The Story So Far
Last column we outlined the three models of dynamic processes in MPI:
spawning new processes using MPI_COMM_SPAWN
and MPI_COMM_SPAWN_MULTIPLE, client/server connections
between existing MPI processes using MPI_COMM_ACCEPT
and MPI_COMM_CONNECT (and supporting
functions MPI_OPEN_PORT, MPI_PUBLISH_NAME, MPI_LOOKUP_NAME,
and MPI_CLOSE_PORT), and using independently-established
sockets between existing MPI processes using MPI_COMM_JOIN.
All of these models are synchronous, meaning that they block until the
action is completed. With some strong caveats about scheduled
environments (discussed last column), the SPAWN functions will
likely completed more-or-less immediately (i.e., they will probably
take as much time as an MPI implementation's job startup mechanism,
such as mpirun). Hence, it will usually block for a short
while, but complete in finite time. JOIN, while fundamentally
asynchronous in nature, is likely to be used mainly in synchronous
situations. Specifically, since a TCP socket must be established prior
to invoking JOIN, the asynchronous aspects of connecting two
previously-existing processes are satisfied elsewhere,
and JOIN will likely be invoked right after the socket has
been established. So JOIN is also likely to be used in
finite/time-bounded situations.
ACCEPT and CONNECT, however, are different. They are
fundamentally asynchronous both in nature and use. The "server"
process blocks in ACCEPT until a "client" process calls a
corresponding CONNECT. Since the client process is likely to
be independent of the server, it is effectively random as to when the
client will invoke CONNECT. This situation can leave the
server blocking indefinitely, and is unsuitable for most
single-threaded applications / MPI implementations.
Threads To The Rescue
ACCEPT works best when it can be left blocking in an
independent thread. This thread can simply loop
over MPI_COMM_ACCEPT, accepting client connections and then
dispatching them to other parts of the server upon demand. This method
is actually quite similar to how many client/server applications are
implemented. The server process can continue other meaningful work and
be interrupted with client requests only as necessary.
A side effect of this approach (and the MPI design) is that
the ACCEPT cannot be interrupted or killed cleanly. In order
to shut down the server process, a dummy connection must be made to
the server's pending ACCEPT (probably originating from within
the server process itself) that issues a command telling the accepting
thread to break out of its ACCEPT loop and die. This trick is
necessary because it is illegal for an ACCEPT to be pending
when another thread in the server invokes MPI_FINALIZE.
Note that not all MPI implementations support ACCEPT
/ CONNECT (or MPI-2 dynamic processes in general) and
multi-threaded MPI applications. The MPI implementation that I work
on, Open MPI, does, and is the basis for the examples provided in this
column.
Sidebar:
MPI Connected
|
MPI formally defines the communication status between two processes -
they are either "connected" or "disconnected" (MPI-2 section 5.5.4):
Two processes are connected if there is a communication path
(direct or indirect) between them. More precisely:
- Two processes are connected if:
- they belong to the same communicator (inter- or intra-,
including MPI_COMM_WORLD) or
- they have previously belonged to a communicator that was
freed with C<MPI_COMM_FREE instead
of MPI_COMM_DISCONNECT or
- the both belong to the group of the same window or
filehandle.
- If A is connected to B and B to C, then A is connected to C. -
Two processes are disconnected (also independent) if
they are not connected.
As such, the state of being "connected" is transitive. This situation
has implications for MPI_COMM_ABORT (used to abort MPI
processes), run-time MPI exception handling, and MPI_FINALIZE
(used to shut down an MPI process). MPI_COMM_ABORT
and MPI_ERRORS_ABORT are allowed (but not required) to abort
all connected processes. MPI_FINALIZE is collective across
all connected processes. Hence, in order to ensure that processes do
not unintentionally block in MPI_FINALIZE, it is a good idea
for dynamic processes to DISCONNECT when communication
between them is no longer required.
|
Disconnecting
Once communication between dynamic processes is no longer required,
the function MPI_COMM_DISCONNECT can be invoked to formally
break communication channels between the processes (see the "MPI
Connected" sidebar). Connected processes impact each other in several
ways; independent processes are unaffected by the run-time behavior of
each other (in terms of MPI semantics).
Hence, processes that are spawned are connected to their
parents. Processes that establish communication via CONNECT
and ACCEPT or JOIN are also connected.
To disconnect from another job, all groups referring to processes in
that job must be freed. Groups spanning the two jobs may exist in
communicators, file handles, or one-sided window handles (the later
two are not discussed in this column). Hence, it may be
necessary to free multiple handles (communicators, files, windows)
before processes become independent of each other.
Note that communicators must be released
via MPI_COMM_DISCONNECT instead
of MPI_COMM_FREE. There is a subtle but important difference:
MPI says that MPI_COMM_FREE only marks the
communicator for deallocation and is guaranteed to return immediately;
any pending communication is allowed to continue (and potentially
complete) in the background. MPI_COMM_DISCONNECT will not
return until all pending communication on the communicator has
completed. Hence, when DISCONNECT returns, the communicator
has truly been destroyed.
|