[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes
gus at ldeo.columbia.edu
Mon Nov 16 16:55:51 EST 2009
We didn't know which compiler you used.
So what Michael sent you ("mmodel=memory_model")
is the Intel compiler flag syntax.
(PGI uses the same syntax, IIRR.)
Gcc/gfortran use "-mcmodel=memory_model" for x86_64 architecture.
I only used this with Intel ifort, hence I am not sure,
but "medium" should work fine for large data/not-so-large program
The "large" model doesn't seem to be implemented by gcc (4.1.2)
(Maybe it is there in newer gcc versions.)
The darn thing is that gcc says "medium" doesn't support building
hence you may need to build OpenMPI static libraries instead,
I would guess.
(Again, check this if you have a newer gcc version.)
Here's an excerpt of my gcc (4.1.2) man page:
Generate code for the small code model: the program and its
symbols must be linked in the lower 2 GB of the address space. Pointers
are 64 bits. Pro-
grams can be statically or dynamically linked. This is the
default code model.
Generate code for the kernel code model. The kernel runs in
the negative 2 GB of the address space. This model has to be used for
Linux kernel code.
Generate code for the medium model: The program is linked in
the lower 2 GB of the address space but symbols can be located anywhere
in the address
space. Programs can be statically or dynamically linked,
but building of shared libraries are not supported with the medium model.
Generate code for the large model: This model makes no
assumptions about addresses and sizes of sections. Currently GCC does
not implement this model.
If you are using OpenMPI, "ompi-info -config"
will tell the flags used to compile it.
Mine is 1.3.2 and has no explicit mcmodel flag,
which according to the gcc man page should default to "small".
Are you using 16GB per process or for the whole set of processes?
I hope this helps,
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
Martin Siegert wrote:
> Hi Michael,
> On Mon, Nov 16, 2009 at 10:49:23AM -0700, Michael H. Frese wrote:
>> Could it be that your MPI library was compiled using a small memory model?
>> The 180 million doubles sounds suspiciously close to a 2 GB addressing
>> This issue came up on the list recently under the topic "Fortran Array size
> I am running MPI applications that use more than 16GB of memory -
> I do not believe that this is the problem. Also -mmodel=large
> does not appear to be a valid argument for gcc under x86_64:
> gcc -DNDEBUG -g -fPIC -mmodel=large conftest.c >&5
> cc1: error: unrecognized command line option "-mmodel=large"
> - Martin
>> At 05:43 PM 11/14/2009, Martin Siegert wrote:
>>> I am running into problems when sending large messages (about
>>> 180000000 doubles) over IB. A fairly trivial example program is attached.
>>> # mpicc -g sendrecv.c
>>> # mpiexec -machinefile m2 -n 2 ./a.out
>>> id=1: calling irecv ...
>>> id=0: calling isend ...
>>> [[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 error
>>> polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id
>>> 199132400 opcode 549755813 vendor error 105 qp_idx 3
>>> This is with OpenMPI-1.3.3.
>>> Does anybody know a solution to this problem?
>>> If I use MPI_Allreduce instead of MPI_Isend/Irecv, the program just hangs
>>> and never returns.
>>> I asked on the openmpi users list but got no response ...
>>> Martin Siegert
>>> Head, Research Computing
>>> WestGrid Site Lead
>>> IT Services phone: 778 782-4691
>>> Simon Fraser University fax: 778 782-4242
>>> Burnaby, British Columbia email: siegert at sfu.ca
>>> Canada V5A 1S6
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf