|
Page 1 of 2 To block or not to block, that is the question, read the following
while I work on my answer
Last month we started discussing non-blocking communication (get
it?). We covered the basic non-blocking (or immediate) send
and receive functions - all of which start a communication -
and touched on their various flavors. We also discussed
the TEST and WAIT functions, and how they are used
to complete communications.
Recall that previous articles have only covered standard
communication (sometimes called "blocking" communication, even though
the functions may not always block!): functions that will not return
until MPI guarantees that the buffer can be [re-]used. Using
non-blocking communications effectively allows the separation of
communication initiation and completion, and allows for the
possibility of communication and computation overlap.
This month, we'll talk more about non-blocking methods and benefits,
and fuel the fire with some more examples about how and why they can
be useful to your MPI application. And remember, latency is like a
good speech; the shorter, the better.
Persistent Sends and Receives
Another form of non-blocking communication is
MPI's persistent messages. Persistent communication offers a
slight optimization to applications that repeatedly send or receive a
buffer with the same message signature. In such cases, the use of
persistent communication can reduce overall latency.
The rationale is to pass all the arguments (buffer, count, datatype,
tag, source/destination, and communicator) and perform the setup
required for the communication only once. Then, in each
iteration of the application, simply say "go" on the previously-setup
operation and let the communication commence. For example:
Listing 1:
Simple peristence
1 MPI_Status status;
2 MPI_Request req;
3 MPI_Send_init(buf, count, dtype, dest, tag, comm, &req);
4 while (looping) {
5 MPI_Start(&req);
6 do_work();
7 MPI_Wait(&req, &status);
8 }
9 MPI_Cancel(&req);
|
The MPI_SEND_INIT function creates a request and sets up the
communication. Its signature is identical to MPI_ISEND (all
the normal sending parameters and the address of
an MPI_Request to fill). The MPI_START function
actually starts the communication operation. The send is a
non-blocking operation and therefore must be finished with
a TEST or WAIT operation. During the next iteration,
there is no need to invoke MPI_SEND_INIT again - we
simply START and WAIT the request. After the loop
has completed, it is proper to MPI_CANCEL a persistent
request. This command tells MPI that the application will not use that
request again - it is safe to destroy and free all associated
resources.
MPI_SEND_INIT is a standard mode persistent
send; MPI_SSEND_INIT, MPI_BSEND_INIT,
and MPI_RSEND_INIT are the synchronous, buffered, and ready
mode persistent functions, respectively. MPI_RECV_INIT is the
persistent receive. They all function similarly
to MPI_SEND_INIT: use the INIT function to create
the request, use the START function to initiate the
communication, and finally use some flavor of TEST
or WAIT to complete it. Also note that just like
the TEST and WAIT functions, START has a
variant that can operate on an array of
requests: MPI_STARTALL.
Why Bother With Non-Blocking?
Invoking special functions and creating additional logic for splitting
the initiation and completion of communications can be quite a
hassle. Why bother?
As with parallel computing in general, the answer is rooted in
optimization. For example, some networks are powered by communication
co-processors - processors that are separate from the main CPU and can
progress message passing events independently of the operating system
and user's application. This design allows even single-threaded MPI
implementations to effect at least some degree of asynchronous
communication progress while the application is not executing inside
the MPI library; the network itself can be given responsibility for
some portion of MPI semantics.
Additionally, standard mode functions only allow one communication to
occur at a time. Non-blocking functions allow the application to
initiate multiple communication operations, enabling the MPI
implementation to progress them simultaneously. Consider the following
code example:
Listing 2:
Cascading linearity
1 while (looping) {
2 if (i_have_a_left_neighbor)
3 MPI_Recv(inbuf, count, dtype, left, tag, comm, &status);
4 if (i_have_a_right_neighbor)
5 MPI_Send(outbuf, count, dtype, right, tag, comm);
6 do_other_work();
7 }
|
Assume at that least one process does not have a left neighbor, and
consider how this code will run in parallel: every process will
receive from its left and then send to its right. But notice that the
above code uses standard mode sends. As a direct result, this
algorithm is actually serialized - it will execute in a domino-like
fashion, causing each process to block while waiting for its left
neighbor.
Using non-blocking communication allows the MPI to progress both
communications simultaneously:
Listing 3:
Non-blocking can avoid cascading linearity
1 while (looping) {
2 count = 0;
3 if (i_have_a_left_neighbor)
4 MPI_Irecv(inbuf, count, dtype, left, tag, comm, &req[count++]);
5 if (i_have_a_right_neighbor)
6 MPI_Isend(outbuf, count, dtype, right, tag, comm, &req[count++]);
7 MPI_Waitall(count, req, &statuses);
8 do_other_work();
9 }
|
The MPI_WAITALL on line 7 allows both communications
to progress simultaneously. Specifically, the send can
proceed before the receive completes. This code will
therefore operate in a truly parallel fashion and will avoid the
domino effect. Note, however, that this particular code example has a
subtle implication: the WAITALL will block until both
communications are complete. Indeed, the astute reader will recognize
that a clever use of MPI_SENDRECV could be used for the same
result. Specifically, blocking on line 7 means that there still may be
some "dead" time while waiting for network communication to complete -
time that could have been used for other work. This situation may be
unavoidable in some applications, but others may have some work that
can be performed while waiting for the communications to complete. For
example:
Listing 4:
Delayed MPI_WAITALL
1 while (looping) {
2 count = 0;
3 if (i_have_a_left_neighbor)
4 MPI_Irecv(inbuf, count, dtype, left, tag, comm, &req[count++]);
5 if (i_have_a_right_neighbor)
6 MPI_Isend(outbuf, count, dtype, right, tag, comm, &req[count++]);
7 do_some_work();
8 MPI_Waitall(count, req, &statuses);
9 do_rest_of_work();
10 }
|
Note the addition of do_some_work()
and do_rest_of_work() on lines 7 and 9,
respectively. do_some_work() represents work that can be
done before the communication completes. Hence, the
application can even utilize the "dead" time while message passing is
occurring in the background - an overlap of communication and
computation. This method works best on networks and/or MPI
implementations that allow for at least some degree of asynchronous
progress, but can even benefit single-threaded, synchronous MPI
implementations. Once the communication
completes, do_rest_of_work() executes, and one assumes it is
performing work that was dependent upon the received messages.
Note that since the same buffers and communication parameters are used
every iteration, a further optimization could use the persistent
mode. This improvement allows the MPI to setup the communications
once, and simply say "go" every iteration:
Listing 4:
Adding persistent requests into the mix
1 int count = 0;
2 if (i_have_a_left_neighbor)
3 MPI_Recv_init(inbuf, count, dtype, left, tag, comm, &req[count++]);
4 if (i_have_a_right_neighbor)
5 MPI_Send_init(outbuf, count, dtype, right, tag, comm, &req[count++]);
6 while (looping) {
7 MPI_Startall(count, req);
8 do_some_work();
9 MPI_Waitall(count, req, &statuses);
10 do_rest_of_work();
11 }
|
|