|
Page 2 of 2
6: Blaming MPI for Programmer Errors
A natural tendency when an application breaks is to blame the MPI
implementation, particularly when your application "works" with one
MPI implementation and (for example) seg faults in another. While no
MPI implementation is perfect, they do typically go through heavy
testing before release. It is quite possible (and likely) that your
application actually has a latent bug that is simply not tripped on
some architectures / MPI implementations.
This sounds arrogant (especially coming from an MPI implementer), but
the vast majority of "bug reports" that we receive are actually due to
errors in the user's application (and sometimes they are very subtle
errors). For example, some compilers initialize variables to default
values (such as zero). Others do not. If your code accidentally
depends on a variable having a default value, it may work fine under
some platforms / compilers, yet cause errors on others.
Before submitting a bug report to the maintainers, double and triple
check your application. Use a memory-checking debugger, such as the
Linux Valgrind package, the Solaris bcheck command-line
checker, or the Purify system. All of these debuggers will report on
the memory usage in your application, including buffer overflows,
reading from uninitialized memory, and so on. You'd be surprised what
will turn up in your application.
Where to Go From Here?
So what did we learn here?
- Ensure your environment is setup correctly. You only need
to do this once.
- Always check non-blocking communication for
completion. Don't leak resources.
- Avoid MPI_PROBE and MPI_IPROBE; they're
evil.
- Ensure that you are using the Right compilers.
- Don't blame MPI for your errors. Use memory-checking
debuggers.
If anything, realize that you are not alone if you run into MPI
problems. The problems discussed in this column are all relatively
easy to fix. So even if you can't get your MPI application to run -
don't despair. The solution is probably just a few Google searches or
a system administrator away.
Stay tuned - next column, we'll continue the list with my Top 5, All
Time Favorite Evils to Avoid in Parallel.
Resources
| MPI Forum (MPI-1 and MPI-2 specifications documents) |
http://www.mpi-forum.org/ |
| MPI - The Complete Reference: Volume 1, The MPI Core (2nd ed) (The
MIT Press) |
By Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and
Jack Dongarra. ISBN 0-262-69215-5 |
| MPI - The Complete Reference: Volume 2, The MPI Extensions (The
MIT Press) |
By William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing
Lusk, Bill Nitzberg, William Saphir, and Marc Snir. ISBN
0-262-57123-4. |
| NCSA MPI tutorial |
http://webct.ncsa.uiuc.edu:8900/public/MPI/ |
| The Tao of Programming |
By Geoffrey James. ISBN 0931137071 |
| Valgrind |
http://www.valgrind.org/ |
This article was originally published in ClusterWorld Magazine. It
has been updated and formatted for the web. If you want to read more
about HPC clusters and Linux, you may wish to visit
Linux Magazine.
Jeff Squyres is the Assistant Director for High Performance Computing
for the Open Systems Laboratory at Indiana University and is the one
of the lead technical architects of the Open MPI project.
Comment on this article
You must login to leave comments...
Other Visitors Comments
There are no comments currently....
|