derived datatypes / "latency hiding"

Rochus Schmid rochus.schmid at
Tue Apr 17 11:17:03 EDT 2001

Hi beowulfers,

I want to implement some sort of standard fd-operations on a 3d-array in
parallel with MPI.
So I will have to exchange halo (shadow/ghost or whatever) slices of
data between neighbors.

Can anyone give some advise (or a hint to any source of info) about the
performance of the MPI implementations (lam, mpich) when using derived
datatypes? So my question is: is it more efficient to copy the strided
data from the boundary of the 3d array myself (possibly using BLAS1
calls) to a contiguous buffer or can I safely rely on derived datatypes?

My second question is:
Does "latency hiding" (first do communication of boundary nonblocking,
then compute inner part of array which does not rely on transfered data
and finish with computation of boundary if communication was finished)
work with these MPI implementations? Someone told me that it wouldn't
help since MPI would "do anything" only during the time the program is
actually somewhere in MPI code and he said it wouldn't matter in which
order the operations are performed.

I am aware that the answer might depend on the specific nature of my
problem (and I should try to find out myself). But both issues mean a
different programming strategy and it would be nice to have some idea
e.g. whether it is worth to implement "latency hiding" cause it seems to
be more complex to program (and debug :-)

Thanks in advance for any sort of info.

Best regards,



Dr. Rochus Schmid
Technische Universität München
Lehrstuhl f. Anorganische Chemie
Lichtenbergstrasse 4, 85747 Garching

Tel.    ++49 89 2891 3174
Fax.    ++49 89 2891 3473
Email   rochus.schmid at

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list