[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes

Gus Correa gus at ldeo.columbia.edu
Mon Nov 16 16:55:51 EST 2009


Hi Martin

We didn't know which compiler you used.
So what Michael sent you ("mmodel=memory_model")
is the Intel compiler flag syntax.
(PGI uses the same syntax, IIRR.)

Gcc/gfortran use "-mcmodel=memory_model" for x86_64 architecture.
I only used this with Intel ifort, hence I am not sure,
but "medium" should work fine for large data/not-so-large program
in gcc/gfortran.
The "large" model doesn't seem to be implemented by gcc (4.1.2)
anyway.
(Maybe it is there in newer gcc versions.)
The darn thing is that gcc says "medium" doesn't support building
shared libraries,
hence you may need to build OpenMPI static libraries instead,
I would guess.
(Again, check this if you have a newer gcc version.)
Here's an excerpt of my gcc (4.1.2) man page:


        -mcmodel=small
             Generate code for the small code model: the program and its 
symbols must be linked in the lower 2 GB of the address space.  Pointers 
are 64 bits.  Pro-
            grams can be statically or dynamically linked.  This is the 
default code model.

        -mcmodel=kernel
            Generate code for the kernel code model.  The kernel runs in 
the negative 2 GB of the address space.  This model has to be used for 
Linux kernel code.

        -mcmodel=medium
            Generate code for the medium model: The program is linked in 
the lower 2 GB of the address space but symbols can be located anywhere 
in the address
            space.  Programs can be statically or dynamically linked, 
but building of shared libraries are not supported with the medium model.

        -mcmodel=large
            Generate code for the large model: This model makes no 
assumptions about addresses and sizes of sections.  Currently GCC does 
not implement this model.


If you are using OpenMPI, "ompi-info -config"
will tell the flags used to compile it.
Mine is 1.3.2 and has no explicit mcmodel flag,
which according to the gcc man page should default to "small".

Are you using 16GB per process or for the whole set of processes?

I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Martin Siegert wrote:
> Hi Michael,
> 
> On Mon, Nov 16, 2009 at 10:49:23AM -0700, Michael H. Frese wrote:
>> Martin,
>>
>> Could it be that your MPI library was compiled using a small memory model?  
>> The 180 million doubles sounds suspiciously close to a 2 GB addressing 
>> limit.
>>
>> This issue came up on the list recently under the topic "Fortran Array size 
>> question."
>>
>>
>> Mike
> 
> I am running MPI applications that use more than 16GB of memory - 
> I do not believe that this is the problem. Also -mmodel=large
> does not appear to be a valid argument for gcc under x86_64:
> gcc -DNDEBUG -g -fPIC -mmodel=large   conftest.c  >&5
> cc1: error: unrecognized command line option "-mmodel=large"
> 
> - Martin
> 
>> At 05:43 PM 11/14/2009, Martin Siegert wrote:
>>> Hi,
>>>
>>> I am running into problems when sending large messages (about
>>> 180000000 doubles) over IB. A fairly trivial example program is attached.
>>>
>>> # mpicc -g sendrecv.c
>>> # mpiexec -machinefile m2 -n 2 ./a.out
>>> id=1: calling irecv ...
>>> id=0: calling isend ...
>>> [[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 error 
>>> polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 
>>> 199132400 opcode 549755813  vendor error 105 qp_idx 3
>>>
>>> This is with OpenMPI-1.3.3.
>>> Does anybody know a solution to this problem?
>>>
>>> If I use MPI_Allreduce instead of MPI_Isend/Irecv, the program just hangs
>>> and never returns.
>>> I asked on the openmpi users list but got no response ...
>>>
>>> Cheers,
>>> Martin
>>>
>>> --
>>> Martin Siegert
>>> Head, Research Computing
>>> WestGrid Site Lead
>>> IT Services                                phone: 778 782-4691
>>> Simon Fraser University                    fax:   778 782-4242
>>> Burnaby, British Columbia                  email: siegert at sfu.ca
>>> Canada  V5A 1S6
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list