[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes

Martin Siegert siegert at sfu.ca
Sat Nov 14 19:43:27 EST 2009


Hi,

I am running into problems when sending large messages (about
180000000 doubles) over IB. A fairly trivial example program is attached.

# mpicc -g sendrecv.c
# mpiexec -machinefile m2 -n 2 ./a.out
id=1: calling irecv ...
id=0: calling isend ...
[[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 199132400 opcode 549755813  vendor error 105 qp_idx 3

This is with OpenMPI-1.3.3.
Does anybody know a solution to this problem?

If I use MPI_Allreduce instead of MPI_Isend/Irecv, the program just hangs
and never returns.
I asked on the openmpi users list but got no response ...

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sendrecv.c
Type: text/x-c++src
Size: 1054 bytes
Desc: not available
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20091114/8a57ca20/attachment-0001.bin>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


More information about the Beowulf mailing list