MPI: How to Succeed in Datatypes Without Really Trying

Article Index

C Structure Layout Example

Although the MPI vector and index interfaces can build complex and useful datatypes, by definition, they cannot describe arbitrary data layouts. The MPI_TYPE_STRUCT function allows the specification of arbitrary type maps.

The use of MPI_TYPE_STRUCT is admittedly fairly klunky - it is necessary to specify arrays of field offsets, counts, and datatypes. Listing 4 shows a program constructing an MPI datatype for a complex C structure (note that several shortcuts were taken for space reasons).

Listing 4: Building an MPI Datatype for a C Structure
 1 struct my_struct {
 2   int int_values[10];
 3   double average;
 4   char debug_name[MAX_NAME_LEN];
 5   int flag;
 6 };
 7 void make_datatype(MPI_Datatype *new_type) {
 8   struct my_struct foo;
 9   int i, counts[4] = { 10, 1, MAX_NAME_LEN, 1 };
10   MPI_Datatype types[4] = { MPI_INT, MPI_DOUBLE, MPI_CHAR, MPI_INT };
11   MPI_Aint disps[4];
12   MPI_Address(foo.int_values, &disps[0]);
13   MPI_Address(&foo.average, &disps[1]);
14   MPI_Address(foo.debug_name, &disps[2]);
15   MPI_Address(&foo.flag, &disps[3]);
16   for (i = 3; i >= 0; --i)
17     disps[i] -= disps[0];
18   MPI_Type_struct(4, counts, disps, types, new_type);
19   MPI_Type_commit(new_type);
20 }

Note the use of the type MPI_Aint on line 11. An MPI_Aint is defined as an integer guaranteed to be large enough to hold an address. C programmers are likely to be confused by lines 12-15: the MPI_ADDRESS function is equivalent to assigning the address of the first argument to the second arguments. The purpose of having MPI_ADDRESS is twofold: 1) avoid ugly casting, and 2) provide "pointer-like" semantics in Fortran. Lines 16-17 are intended to subtract off the base address of the structure, effectively converting the disps array to hold relative displacements instead of absolute addresses.

The MPI_TYPE_STRUCT call on line 18 creates a datatype from the type map components, and it is committed on line 19. The new_type datatype can now be used to send instances of struct my_struct.

Sidebar: Sidebar - One Message vs. Many Messages

A common knee-jerk reaction to the complexity of MPI datatypes is "I'll just send my structure in a series of messages rather than construct a datatype." Consider the C structure show in Listing 4.

The contents of my_struct could easily be sent in four separate messages. But consider the actual cost of sending four messages instead of constructing an MPI datatype and sending one message: although the same amount of data will be sent, you'll likely be sending at least four times the overhead for the same amount of data, causing the overall efficiency ratio to drop. Specifically, most MPI implementations send a fixed amount of overhead for each message. Hence, you'll be sending that fixed overhead three more times than is necessary. Additionally, you may incur up to four times the latency before the entire dataset is received at the destination.

If this structure is sent frequently in your application (e.g., once every iteration), or if you have to send arrays of this structure, the extra overhead and latency can quickly add up and noticeably degrade performance.

Instead, it is frequently better to make an appropriate datatype and send the entire structure in a single message. Indeed, entire arrays of the structure can also be sent in a single message, dramatically increasing efficiency as compared to sending four messages for each structure instance in the array.

Where to Go From Here?

This column has really only touched on some parts of MPI datatypes - there's more to MPI datatypes than meets the eye. Stay tuned for next column where we'll discuss more features and issues with datatypes.

{mosgoogle right}

Resources
The Tao of Programming By Geoffrey James. ISBN 0931137071
MPI Forum (MPI-1 and MPI-2 specifications documents) http://www.mpi-forum.org
MPI - The Complete Reference: Volume 1, The MPI Core (2nd ed) (The MIT Press) By Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. ISBN 0-262-69215-5
MPI - The Complete Reference: Volume 2, The MPI Extensions (The MIT Press) By William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. ISBN 0-262-57123-4.
NCSA MPI tutorial http://webct.ncsa.uiuc.edu:8900/public/MPI/

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.

Jeff Squyres is the Assistant Director for High Performance Computing for the Open Systems Laboratory at Indiana University and is the one of the lead technical architects of the Open MPI project.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.