MPI: Is Your Application Spawnworthy?

Concrete Example

Last column, I mentioned that many of the early uses of MPI-2 dynamic processes were rather mundane and usually unnecessary (e.g., launch a singleton ./a.out that launches all of its peers). Now that threads can be mixed with MPI function calls, particularly with respect to dynamic process functionality, more interesting (and useful) options are available.

In short, MPI has previously been used mainly for parallel computing. With proper use of MPI-2 dynamic process concepts, MPI can be used for distributed computing.

For example, the canonical manager/worker parallel model is as follows: a manager starts a set of workers, doles out work to each of them, and waits for results to be returned. The send-work-and-wait-for-answers pattern is repeated until no work remains and all the results have been collected. The master then tells all workers to quit and everything shuts down.

However, consider reversing the orientation of model: the manager starts up and waits for workers to connect and ask for work. That is, workers start - and possibly shut down - independently of the manager. This concept is not a new; it is exactly what massively distributed projects such as distributed.net and SETI@home (and others) have been doing for years. Although this has been possible in some MPI implementations for some time, only recently have some MPI implementations started to make scalable, massively distributed computing a reality.

Consider a large corporation that has thousands of desktop computers. When the employees go home at night, the machines are typically powered off (or are otherwise idle). What if, instead, those machines could be harnessed for large-scale distributed computations? This goal has actually been the aspiration of many a CIO for years.

Corralling all the machines simultaneously to start a single parallel job is an enormous task (and logistically improbable, to say the least). But if a user-level MPI process on each machine started itself - independently of its peers - when the employee went home for the evening, the model becomes much more feasible. This MPI process can contact a server and join a larger computation and run all night. When the employee returns in the morning, the MPI process can disconnect from the computation (independently from its peers) and go back to sleep.

The model is also interesting when you consider the heterogeneous aspects of it: employee workstations may be one of many different flavors of POSIX, or Windows. A portable implementation of MPI can span all of these platforms, using the full power of C, C++, or Fortran (whatever the science/engineering team designing the application prefers) to implement the application on multiple platforms. MPI takes care of most of the heterogeneous aspects of data communication - potentially allowing the programmers to concentrate on the application (not the differences between platforms).

The server will need to exhibit some fault-tolerant characteristics. For example, it must be smart enough to know when to re-assign work to other resources because a worker suddenly became unavailable. However, these are now fairly well-understood issues (particularly in manager-worker models) and can be implemented in a reasonable fashion.

Granted, this model only works for certain types of applications. But it is still a powerful - and simple - concept that can is largely unexploited with modern MPI implementations, mainly (I think) because people are unaware that MPI can be used this way.

Where to Go From Here?

It should be noted that there are research projects and commercial products that are specifically designed to utilize idle workstations. Condor, from the University of Wisconsin at Madison, is an excellent project that whose software works well, but is mainly targeted at serial applications (although recent efforts are concentrating on integrating Condor into grid computations). Several vendors have products that function similarly to distributed.net and SETI@home clients (a small daemon that detects when the workstation is idle and requests work from a server). This situation is also quite similar to what some people mean by the term "grid computing." However, none of these current efforts use MPI for their communication framework.

I want to be absolutely clear here: I am not saying that MPI is the answer to everyone's distributed computing problems. I am simply saying that the familiar paradigm of MPI can also be used for distributed computing. While the concepts for it may be relatively young in MPI implementations, the definitions in the standard make it possible, and support in MPI implementations is growing all the time (e.g., in Open MPI). I encourage readers to explore this concept and demand more from your MPI implementers.

Got any MPI questions you want answered? Wondering why one MPI does this and another does that? Send them to the MPI Monkey. {mosgoogle right}

Resources
Pi Day songs http://www.winternet.com/ mchristi/piday.html
Pi day FAQ http://mathforum.org/t2t/faq/faq.pi.html
Pi movie http://www.pithemovie.com/
Condor project http://www.cs.wisc.edu/condor/
Open MPI http://www.open-mpi.org
MPI Forum (MPI-1 and MPI-2 specifications documents) http://www.mpi-forum.org/
MPI - The Complete Reference: Volume 1, The MPI Core (2nd ed) (The MIT Press) By Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. ISBN 0-262-69215-5
MPI - The Complete Reference: Volume 2, The MPI Extensions (The MIT Press) By William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. ISBN 0-262-57123-4.
NCSA MPI tutorial http://webct.ncsa.uiuc.edu:8900/public/MPI/

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.

Jeff Squyres is leading up Cisco's Open MPI efforts as part of the Server Virtualization Business Unit.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.