- Published on Sunday, 05 March 2006 14:00
- Written by Jeff Squyres
- Hits: 8168
All production-quality MPI implementations can handle simultaneous progress of multiple requests, even those that do not allow true asynchronous progress. Hence, even if polling (via TEST operations) is required, non-blocking communication programming models can still represent a large performance gain as compared to standard / blocking mode communication.
Just because an operation is non-blocking does not mean that it is somehow automatically more efficient than if it were blocking. Indeed, many of the same best practices that apply to blocking communication also apply to non-blocking communication. One such best practice judiciously pre-posting non-blocking receives. This method potentially helps an MPI implementation reduce the use of temporary buffers.
For example, if a message is received in an MPI process that is unexpected - meaning that the application did not [yet] post a corresponding receive - the MPI implementation may have to allocate a temporary buffer to receive it. If a matching receive is ever posted, the MPI implementation copies the message from the temporary buffer into the destination buffer.
However, if a non-blocking receive is posted before the message is received, once the message arrives, it is expected and can be received directly into the target buffer. No temporary buffer needs to be allocated and no extra memory copy is necessary. Hence, ensuring to pre-posting receives can increase the efficiency of an MPI application.
MPI-2: Multiple Types of Requests
MPI-2 defines two new types of operations that can be started and completed using MPI_Request handles: parallel I/O and user-mode "generalized" requests. Although those operations are the topics for future columns, suffice it to say that both of them follow the same general model as non-blocking point-to-point communication: actions are started with calls to MPI functions that generate requests and are completed with calls to TEST or WAIT operations.
A subtle implication is that the array-based TEST and WAIT functions can accept multiple MPI_Request handles regardless of the type of pending operation that they represent. Hence, it is possible to create an array of requests that encompasses both point to point and I/O communications, and have MPI_WAITALL wait for the completion of all of them.
ROMIO is a popular implementation of many of the MPI-2 I/O function calls from Argonne National Laboratory (e.g., MPI_FILE_OPEN, MPI_FILE_READ, etc.). ROMIO's implementation is layered on top of MPI-1 point-to-point communication; it is specifically designed to be used as an add-on to existing MPI implementations (such as LAM/MPI, LA-MPI, FT-MPI, and MPICH, to name a few). This layering creates problems because ROMIO cannot re-define the underlying type MPI_Request since it has already been defined by the underlying MPI implementation. Moreover, the back-end of MPI_Request is different in every MPI implementation; ROMIO can't extend it in a portable way.
ROMIO's solution was to create a new type: MPIO_Request. All MPI_FILE* functions that are supposed to take an MPI_Request as a parameter instead take an MPIO_Request. This situation means that ROMIO technically does not conform to the MPI-2 standard, but this detail is usually overlooked for the sake of portability and functionality.
There is a notable side effect, however. Since MPI_TEST and MPI_WAIT (and their variants) take MPI_Request arguments, they cannot accept ROMIO MPIO_Requests. Hence, ROMIO implements its own MPIO_TEST and MPIO_WAIT functions. As such, MPI implementations that use ROMIO generally do not support invoking the various TEST and WAIT functions with arrays of point-to-point and I/O requests.
Where to Go From Here?
Non-blocking communications, when used properly, can provide a tremendous performance boost to parallel applications. In addition to allowing the MPI to perform at least some form of asynchronous progress (particularly when used with communication co-processor-based networks), it allows the MPI to progress multiple communication operations simultaneously.
Got any MPI questions you want answered? Wondering why one MPI does this and another does that? Send them to the MPI Monkey.
|ROMIO: A High-Performance, Portable MPI-IO Implementation||http://www.mcs.anl.gov/romio|
|MPI Forum (MPI-1 and MPI-2 specifications documents)||http://www.mpi-forum.org|
|MPI - The Complete Reference: Volume 1, The MPI Core (2nd ed) (The MIT Press)||By Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. ISBN 0-262-69215-5|
|MPI - The Complete Reference: Volume 2, The MPI Extensions (The MIT Press)||By William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. ISBN 0-262-57123-4.|
|NCSA MPI tutorial||http://webct.ncsa.uiuc.edu:8900/public/MPI/|
This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.
Jeff Squyres is the Assistant Director for High Performance Comptuing for the Open Systems Laboratory at Indiana University and is the one of the lead technical architects of the Open MPI project.
- << Prev Page
- - Next Page