[Beowulf] IB problem/using IB diagnostics
prentice at ias.edu
Fri Jun 19 12:48:37 EDT 2009
Gus Correa wrote:
> Prentice Bisbal wrote:
>> John Hearns wrote:
>>> 2009/6/18 Prentice Bisbal <prentice at ias.edu <mailto:prentice at ias.edu>>
>>> John Hearns wrote:
>>> > Can you log into node36 and run ibstat or ibstatus?
>>> Looks good to me!
>>> Links are up and it sees a subnet manager. As Greg says, looks like
>>> something wonky in the script which is reporting
>>> the node status??
>> It's actually an MPI job (HPL using OpenMPI) which is reporting the
>> The head scratching continues...
> Hi Prentice, list
> Just in case you haven't seen this ...
> Are you using OpenMPI 1.3.0 or 1.3.1?
> Those versions have a memory leak bug when using IB.
> The solution for the memory leak is to upgrade to 1.3.2.
> A workaround is to use -mca mpi_leave_pinned=0.
> My HPL with OpenMPI 1.3.1 crashed when using lots of memory.
> I upgraded to 1.3.2, which fixed the problem,
> and I haven't looked at the error messages,
> so your problem may be different.
> However, memory leaks can produce weird errors, hard to diagnose.
I'm using OpenMPI 1.2.8
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf