Article Index

2: Serialization

May users are nervous about using MPI's various modes of non-blocking communications and instead simply use MPI_SEND and MPI_RECV. This habit can lead to performance degradation by unknowingly serializing parallel applications. Processes blocked in MPI_SEND or MPI_RECV may be wasting valuable CPU cycles while simply waiting for communication with peer processes. This situation can even lead to a domino-like effect where a series of processes are waiting for each other and progress only occurs in a peer-by-peer fashion - just like the penguins in the beginning of this article.

This behavior can almost always be fixed in the application. While some algorithms simply cannot avoid this problem, most can be re-factored to allow a true overlap of computation and communication. Specifically: allow the MPI to perform message passing "in the background" while the user application is performing useful work. A common technique is to use multiple pairs of buffers, swapping between them on successive iterations. For example, in iteration N, initiate communication using buffer A and perform useful local work on buffer B. In iteration N+1, swap the buffers: communicate with buffer B and work on buffer A. See the pseudocode in Listing 1 for an example.

Listing 1: Communication and Computation Overlap
1  buffer_comm = A;
2  buffer_work = B;
3  for (...) {
4      /* Send the communication buffer */
5      MPI_Isend(buffer_comm, ..., &req);
7      /* Do useful work on the other buffer */
8      do_work(buffer_work);
10     /* Finish the communication */
11     MPI_Wait(&req, &status);
13     /* Swap the buffers */
14     buffer_tmp = buffer_comm;
15     buffer_comm = buffer_work;
16     buffer_work = buffer_tmp;
17 }

And the Number 1, All-Time Favorite Evil to Avoid in Parallel is...

1: Assuming MPI_SEND Will [Not] Block

In a previous edition of this column, I included a sidebar entitled "To Block or Not To Block" describing typical user confusion as to whether MPI_SEND is supposed block or not. It still remains a popular question, frequently asked in multiple forms:

  • "My application blocks in MPI_SEND - but only sometimes. Why?"
  • "Why does my application work fine with Foo MPI, but deadlock in Bar MPI?"
  • "When MPI_SEND returns, has the destination received the message?"

MPI_SEND and MPI_RECV are called "blocking" by the MPI-1 standard, but they may or may not actually block. Whether or not an unmatched send will block typically depends on how much buffering the implementation provides. For example, short messages are usually sent "eagerly" - regardless of whether a matching receive has been posted or not. Long messages may be sent with a rendezvous protocol, meaning that it will not actually complete until the target has initiated a matching receive. This behavior is legal because the semantics of MPI_SEND do not actually define whether message has been sent when it returns. The only guarantee that MPI makes is that the buffer is able to be re-used when MPI_SEND returns.

Receives, by their definition, will not return until a matching message has actually been received. If a matching short message was previously eagerly sent then it may be received "immediately" for example. This case is called an "unexpected" message, and MPI implementations typically provide some level of implicit buffering for this condition: eagerly-sent, unmatched messages are simply stored in internal buffering at the target until a matching receive is posted by the application. A local memory copy is all that is necessary to complete the receive.

Note that it is also legal for an MPI implementation to provide zero buffering - to effectively disallow unexpected messages and block MPI_SEND until a matching receive is posted (regardless of the size of the message). MPI applications that assume at least some level of underlying buffering are not conformant (i.e., applications that assume that MPI_SEND will or will not block), and may run to completion under one MPI implementation but block in another.

Where to Go From Here?

There you have it - my canonical list of things to avoid while programming in parallel. Note that even though this is my favorite list, your mileage may vary - every parallel application is different. The real moral of the story here is to thoroughly understand both your application and the run-time environment of the MPI implementation that you're using. This understanding is the best way to obtain the best performance.

Next column, we'll launch into the nitty-gritty details of non-blocking communication. Stay tuned!

MPI Forum (MPI-1 and MPI-2 specifications documents)
MPI - The Complete Reference: Volume 1, The MPI Core (2nd ed) (The MIT Press) By Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. ISBN 0-262-69215-5
MPI - The Complete Reference: Volume 2, The MPI Extensions (The MIT Press) By William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. ISBN 0-262-57123-4.
NCSA MPI tutorial
This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.

Jeff Squyres is the Assistant Director for High Performance Computing for the Open Systems Laboratory at Indiana University and is the one of the lead technical architects of the Open MPI project.

You have no rights to post comments


Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.


Creative Commons License
©2005-2019 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.