[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes

Michael H. Frese Michael.Frese at NumerEx-LLC.com
Mon Nov 16 12:49:23 EST 2009


Martin,

Could it be that your MPI library was compiled using a small memory 
model?  The 180 million doubles sounds suspiciously close to a 2 GB 
addressing limit.

This issue came up on the list recently under the topic "Fortran 
Array size question."


Mike

At 05:43 PM 11/14/2009, Martin Siegert wrote:
>Hi,
>
>I am running into problems when sending large messages (about
>180000000 doubles) over IB. A fairly trivial example program is attached.
>
># mpicc -g sendrecv.c
># mpiexec -machinefile m2 -n 2 ./a.out
>id=1: calling irecv ...
>id=0: calling isend ...
>[[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 
>error polling LP CQ with status LOCAL LENGTH ERROR status number 1 
>for wr_id 199132400 opcode 549755813  vendor error 105 qp_idx 3
>
>This is with OpenMPI-1.3.3.
>Does anybody know a solution to this problem?
>
>If I use MPI_Allreduce instead of MPI_Isend/Irecv, the program just hangs
>and never returns.
>I asked on the openmpi users list but got no response ...
>
>Cheers,
>Martin
>
>--
>Martin Siegert
>Head, Research Computing
>WestGrid Site Lead
>IT Services                                phone: 778 782-4691
>Simon Fraser University                    fax:   778 782-4242
>Burnaby, British Columbia                  email: siegert at sfu.ca
>Canada  V5A 1S6
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list