|
Page 1 of 2
Asynchronous Joy
Did you ever notice that Fred Flintstone was the only one who powered
his car? You could see his feet running below the car, pushing it
along, but no one else ever helped. This is clearly a single inertia,
multiple dependents (SIMD) situation - it's almost like Fred was
providing a taxi service for all the other freeloaders. Indeed,
getting some of the others to help push would have resulted in greater
efficiency - a multiple inertia, multiple dependents (MIMD)
scenario. I call this the Flint's Taxi Economy principle.
(say it out loud)
The Story So Far
So far, we've talked about a lot of the basics of MPI. Parallel
startup and short, MPI terminology, some point-to-point and collective
communication, datatypes, communicators... what on earth could be
next? Is there really more?
Why yes, Virginia, yes there is. What we've talked about so far can be
used to write a wide variety of MPI programs. But most of the previous
discussion has been about blocking calls that, for example, may or may
not execute in parallel to the main computation. Specifically, we've
discussed standard mode sends and receives (MPI_SEND
and MPI_RECV). Most communication networks function at least
an order of magnitude slower than local computations. Hence, if an MPI
process has to wait for non-local communication, critical CPU cycles
are lost while the operating system blocks the process, waits for the
communication, and then resumes the process. Parallel applications
therefore typically run at their peak efficiency when there can be a
true overlap of communication and computation.
The MPI standard allows for this overlap with several forms of
non-blocking communication.
Immediate (Non-Blocking) Sends
The basic non-blocking communication function
is MPI_ISEND. It gets its name from being
an immediate, local operation. That is,
calling MPI_ISEND will return to the caller immediately, but
the communication operation may be ongoing. Specifically, the caller
may not re-use the buffer until the operation started
with MPI_ISEND has completed (completion semantics are
discussed below). In all other ways, MPI_ISEND is just like
the standard-mode MPI_SEND: think of MPI_ISEND as
the starting of a communication, and think of MPI_SEND as
both the starting and completion of a communication (remember that MPI
defines completion in terms of buffers - the operation is
complete when the buffer is available for re-use, but doesn't specify
anything about whether the communication has actually occurred or
not).
The C binding for MPI_ISEND is:
Figure 1:
C binding for MPI_ISEND
int MPI_Isend(void *buf, int count, MPI_Datatype dtype, int dest, int tag,
MPI_Comm comm, MPI_Request *req);
|
You'll notice that MPI_ISEND takes the same parameters
as MPI_SEND except for the last one: a pointer to
an MPI_Request. The MPI implementation will fill the request
with a handle that can be used to test or wait for completion of the
send. More on that later.
| Sidebar: Buffered Sends Are Evil |
|
There are three buffered send
functions: MPI_BSEND, MPI_IBSEND,
or MPI_BSEND_INIT, corresponding to standard, immediate, and
persistent modes, respectively. Unlike the other kinds of sends, these
functions are local operations. If the send cannot complete
immediately, the MPI implementation must copy the message to an
internal buffer and complete it later. Regardless, the message buffer
must be returned to the caller and be available for use.
This may sound like a great convenience - the application is free from
having to perform buffer management. But it almost guarantees a
performance loss; at least some MPI implementations always do
a buffer copy before attempting to send the message. This adds latency
to the message's travel time.
Whenever possible, avoid buffered sends.
|
"Immediate" also applies to MPI's other modes of sending: synchronous,
buffered, and ready. Synchronous sends will not complete until a
matching receive has been posted at the destination process. Hence,
when a synchronous send completes, not only is the message buffer
available for re-use, the application is also guaranteed that at least
some communication has occurred (although it does not
guarantee that the entire message has been received yet).
Buffered sends mean that the MPI implementation may copy the message
to an internal buffer before sending it. See the sidebar entitled
Buffered Sends Are Evil
for more information on buffered mode sends. Ready sends are an
optimization when it can be guaranteed that a matching receive has
already been posted at the destination (specifically, it is erroneous
to initiate a ready send if a matching receive has not already been
posted). In such cases, the MPI implementation may be able to take
shortcuts in sending protocols to reduce latency and/or increase
bandwidth. Not all MPI implementation actually implement
optimizations; in the worse case, a ready send is identical to a
standard send.
Aside from their names, all three modes have identical bindings
to MPI_ISEND: MPI_ISSEND, MPI_IBSEND,
and MPI_IRSEND for immediate synchronous, buffered, and ready
send, respectively.
Immediate (Non-Blocking) Receives
MPI also has an immediate receive. Just like the immediate send, the
immediate receive starts a receive (or "posts" a
receive). Its C binding is identical to MPI_RECV except for
the last argument:
Figure 2:
C binding for MPI_IRECV
int MPI_Irecv(void *buf, int count, MPI_Datatype dtype, int src, int tag,
MPI_Comm comm, MPI_Request *req);
|
Just like the immediate send functions, the final argument is a
pointer to an MPI_Request. Upon return
from MPI_IRECV, the request can be used for testing or
waiting for the receive to complete.
Who Matches What?
With all the different flavors of sending and receiving, how can you
tell which send will be matched with which receive? Thankfully, it's
uncomplicated. When a process receives a message, it looks for a
matching receive only based on the signature of the
message: the message's tag, communicator, and source and destination
arguments. The sending mode (standard, synchronous, buffered, or read)
and type (blocking, non-blocking, or persistent) are not used for
matching incoming messages to pending receives.
Completion of Non-Blocking Communications
Once a non-blocking communication has been started (regardless of
whether it is a send or a receive), the application is given
an MPI_Request check for completion of the operation. Two
operations are available to check for completion of a
request: MPI_TEST and MPI_WAIT:
Figure 3:
C bindings for MPI_TEST and MPI_WAIT
int MPI_Test(MPI_Request *req, int *flag, MPI_Status *status);
int MPI_Wait(MPI_Request *req, MPI_Status *status);
|
Both take pointers to an MPI_Request and
an MPI_Status. The request is checked to see if the
communication has completed. If it has, the status is filled in with
details about the completed operation. MPI_TEST will return
immediately, regardless of whether the operation has completed (the
flag argument is set to 1 if it completed); it can be used for
periodic polling. MPI_WAIT will block until the operation has
completed.
The TEST and WAIT functions shown above check for
the completion of a single request; other variations exist for
checking arrays of requests (e.g., checking for completion of one in
the array, one or more in the array, or all in the
array): MPI_*ANY, MPI_*SOME, MPI_*ALL,
respectively, where * is either TEST
or WAIT.
|