MPI: Groups and Communicators

Article Index

Everything You Wanted to Know About Groups and Communicators, But Were Afraid to Ask

Actual e-mail spams seen recently:

  • Banned MPI CD! The government doesn't want you to see this!
  • Our licensed MPI programmers will prescribe parallel applications for free.
  • Enlarge your parallel application performance by 5x with MPIagra!
  • Nigerian bank director needs MPI developers to receive US$25M in offshore funding.

Am I the only one that gets these?

The Story So Far

We've been progressively leading up to more complicated topics in this column - starting with the 6 basic functions of MPI, moving on to the differences between ranks, processors, and processes, detailing what MPI_INIT and MPI_FINALIZE really mean. Last month, we discussed collective communication. But wait - if you call in the next 10 minutes, there's even more! (sorry - spam flashbacks)

Groups and communicators can play a critical role in the selection of parallel algorithms that you use in your application. Although parallel applications can be implemented in many different ways, MPI provides a rich set of process grouping features that are frequently under-utilized in user applications (particularly with respect to collective communications).

MPI Groups and Communicators

I've made references to "communicators" in previous editions of this column and usually made cryptic statements about "fixed sets," "ordered processes," and "communication contexts." But I've never really explained what a communicator is. It's important to understand communicators and what they mean to your application because communicators are the basis for all MPI point-to-point and collective communication.

Communicators are comprised of two elements: a group and a unique communications context (actually, it may be two groups - more on that below). Let's discuss groups first.

MPI Groups

An MPI group is a fixed, ordered set of unique MPI processes. The exact definition of an MPI process was discussed in the Jan 2003 edition of this column. Essentially, the MPI implementation is free to define what "MPI process" means. Examples include: a thread, a POSIX process, or a Windows process. Although most MPI implementations use the operating system's concept of a "process," but some do define threads as an MPI process. A process can appear at most exactly once in a group - it is either in the group or not; a process is never in a group more than once. A process can be in multiple groups, however. {mosgoogle right}

More specifically, an MPI group is a local representation of a set of MPI processes. MPI groups are represented by the opaque type MPI_Group in C applications. Hence, a process can contain local representations of many MPI groups - some of which may not include itself.

MPI defines a rich set of operations on groups; since a group is essentially an ordered set (in the algebraic sense of the word), an application can perform group unions, intersections, inclusions, exclusions, comparisons, and so on. These operations, while not commonly invoked in many user applications, form the backbone of communicator functionality and may be used by the MPI implementation itself.

As an example, one of the group operations provided by MPI is the comparison of two groups (MPI_GROUP_COMPARE), which can yield one of three results:

  • MPI_IDENT: The two groups contain the same set of processes in the same order
  • MPI_SIMILAR: The two groups contain the same set of processes, but in a different order
  • MPI_UNEQUAL: The two groups do not contain the same set of processes.

While seemingly an unimportant operation, it provides insight into one of MPI's central philosophies: the membership in a group is fixed and strongly ordered. This feature is most apparent to users because communicators have fixed, ordered memberships. But this is only a by-product of the fact that a communicator contains a group.

MPI Communicators

Communicators are represented in MPI C programs by the type MPI_Comm (Fortran programs use integers). Although communicator is a local MPI object (i.e., it physically resides in the MPI process), it represents a process' membership in a larger process group. Specifically, even though MPI_Comm objects are local, they are always created collectively between all members in the group that the communicator contains. Hence, a process can only have an MPI_Comm handle for communicators of which it is a member.

The context of a communicator is effectively a guarantee that a message sent on one communicator will never be received on a different communicator. Consider the arguments of the MPI_SEND and MPI_RECV functions (C binding shown below):

int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, 
             MPI_Comm comm);

int MPI_Recv(void *buf, int count, MPI_Datatype dtype, int src, int tag, 
             MPI_Comm comm, MPI_Status *status);

A sent message will only be delivered to a matching receive in the destination process. This means that the MPI_SEND has to use the triple (dest, tag, comm) that specifies a peer process in the communicator who has posted a receive with a corresponding (src, tag, comm) triple. The src and dest values must be equal, or the receiver can use the wildcard MPI_ANY_TAG; the tag values must be equal, or the receiver can use the wildcard MPI_ANY_TAG; the comm values must represent the same communicator.

Note the last part - the comm values must represent the same communicator; there is no wildcard communicator value. The communicator therefore functions similarly to the "tag" argument in MPI_SEND (and friends) - think of it as a system-level tag. Specifically, a message sent on a given (tag, communicator) tuple will only ever be received on the same (tag, communicator) tuple by the receiver (with the MPI_ANY_TAG exception).

Sidebar: Communicators -- what's the point?

So what's all this hoopla about communicators? Why bother? Why not just send and receive messages, filtering them via tags?

One answer is parallel libraries. Libraries that use message passing need to have a way to guarantee that the messages they send and receive will never be confused with messages sent and received by the user application. Communicators, with their unique (and private) communication context, allow this message passing safety.

Many parallel libraries, for example, use the MPI_COMM_DUP call at startup time to duplicate MPI_COMM_WORLD - the pre-defined communicator created after MPI_INIT that contains all processes that were started together. The new communicator will have exactly the same process group, but a different (unique) context than MPI_COMM_WORLD. The library can then use this communicator for all of its communications.

Communicator Properties

Remember that a communicator contains a group, and a group is a strongly ordered set of processes. Therefore, communicators are also strongly ordered sets of processes. More importantly, the order is guaranteed to be the same on all processes in the communicator (group). Hence, the process referred to by (index, communicator) is guaranteed to be the same on all processes in the communicator.

The "index" value ranges from 0 to the number of processes in the communicator minus 1. This index value is called the process' "rank" in the communicator. Hence, MPI point-to-point communication routines (e.g., MPI_SEND and MPI_RECV) are expressed in terms of ranks and communicators - the source or destination of the message.

Don't get carried away with the term "rank," however. A rank refers to a specific process in a specific communicator. A single rank value may therefore refer to multiple different MPI processes. For example, it is not correct to say "send to rank 0." It is more correct to say "send to MPI_COMM_WORLD rank 0."

There are actually two kinds of communicators: intracommunicators (those that only contain one group of processes) and intercommunicators (those that contain two groups of processes). Let's talk about the most common kind first, intracommunicators (one group).

Intracommunicators

The name "intracommunicator" specifically refers to communication within a single group. MPI_COMM_WORLD is perhaps the most famous of intracommunicators. It is defined in the MPI-1 standard as "all processes the local process can communicate with after initialization (including itself), and is defined once MPI_INIT has been called." Although the specific meaning of this statement varies between different MPI implementations, it generally means that all MPI processes started via mpirun will be included in MPI_COMM_WORLD together.

Another, lesser-known pre-defined intracommunicator is MPI_COMM_SELF, which is defined to only include the local process. This communicator can be useful for loopback kinds of communicators, depending on the application's algorithms.

    Search

    Login And Newsletter

    Create an account to access exclusive content, comment on articles, and receive our newsletters.

    Feedburner

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.