Try disabling shared memory only.<br>Open MPI shared memory buffer is limited and it enters deadlock  if you overflow it. <br>As Open MPI uses busy wait, it appears as a livelock. <br><br><br><div class="gmail_quote">2008/7/9 Ashley Pittman &lt;<a href="mailto:apittman@concurrent-thinking.com">apittman@concurrent-thinking.com</a>&gt;:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d">On Tue, 2008-07-08 at 22:01 -0400, Joe Landman wrote:<br>

&gt; &nbsp; &nbsp;Short version: &nbsp;The code starts and runs. &nbsp;Reads in its data. &nbsp;Starts<br>

&gt; its iterations. &nbsp;And then somewhere after this, it hangs. &nbsp;But not<br>

&gt; always at the same place. &nbsp;It doesn&#39;t write state data back out to the<br>

&gt; disk, just logs. &nbsp;Rerunning it gets it to a different point, sometimes<br>

&gt; hanging sooner, sometimes later. &nbsp;Seems to be the case on multiple<br>

&gt; different machines, with different OSes. &nbsp;Working on comparing MPI<br>

&gt; distributions, and it hangs with IB as well as with shared memory and<br>

&gt; tcp sockets.<br>

<br>

</div>Sounds like you&#39;ve found a bug, doesn&#39;t sound too difficult to find,<br>

comments in-line.<br>

<div class="Ih2E3d"><br>

&gt; &nbsp; &nbsp;Right now we are using OpenMPI 1.2.6, and this code does use<br>

&gt; allreduce. &nbsp;When it hangs, an strace of the master process shows lots of<br>

&gt; polling:<br>

<br>

</div>Why do you mention allreduce, does it tend to be in allreduce when it<br>

hangs? &nbsp;Is it happening at the same place but on a different iteration<br>

every time perhaps? &nbsp;This is quite important, you could either have a<br>

&quot;random&quot; memory corruption which can cause the program to stop anywhere<br>

and are often hard to find or a race condition which is easier to deal<br>

with, if there are any similarities in the stack then it tends to point<br>

to the latter.<br>

<br>

allreduce is one of the collective functions with an implicit barrier<br>

which means that *no* process can return from it until *all* processes<br>

have called it, if you program uses allreduce extensively it&#39;s entirely<br>

possible that one process has stopped for whatever reason and have the<br>

rest continued as far as they can until they too deadlock. &nbsp;Collectives<br>

often get accused of causing programs to hang when in reality N-1<br>

processes are in the collective call and 1 is off somewhere else.<br>

<div class="Ih2E3d"><br>

&gt; c1-1:~ # strace -p 8548<br>

<br>

</div>&gt; [spin forever]<br>

<br>

Any chance of a stack trace, preferably a parallel one? &nbsp;I assume *all*<br>

processes in the job are in the R state? &nbsp;Do you have a mechanism<br>

available to allow you to see the message queues?<br>

<div class="Ih2E3d"><br>

&gt; So it looks like the process is waiting for the appropriate posting on<br>

&gt; the internal scoreboard, and just hanging in a tight loop until this<br>

&gt; actually happens.<br>

&gt;<br>

&gt; But these hangs usually happen at the same place each time for a logic<br>

&gt; error.<br>

<br>

</div>Like in allreduce you mean?<br>

<div class="Ih2E3d"><br>

&gt; But the odd thing about this code is that it worked fine 12 - 18 months<br>

&gt; ago, and we haven&#39;t touched it since (nor has it changed). &nbsp;What has<br>

&gt; changed is that we are now using OpenMPI <a href="http://1.2.6." target="_blank">1.2.6.</a><br>

<br>

</div>The other important thing to know here is what you have changed *from*.<br>

<div class="Ih2E3d"><br>

&gt; So the code hasn&#39;t changed, and the OS on which it runs hasn&#39;t changed,<br>

&gt; but the MPI stack has. &nbsp;Yeah, thats a clue.<br>

<br>

&gt; Turning off openib and tcp doesn&#39;t make a great deal of impact. &nbsp;This is<br>

&gt; also a clue.<br>

<br>

</div>So it&#39;s likely algorithmic? &nbsp;You could turn off shared memory as well<br>

but it won&#39;t make a great deal of impact so there isn&#39;t any point.<br>

<div class="Ih2E3d"><br>

&gt; I am looking now to trying mvapich2 and seeing how that goes. &nbsp;Using<br>

&gt; Intel and gfortran compilers (Fortran/C mixed code).<br>

&gt;<br>

&gt; Anyone see strange things like this with their MPI stacks?<br>

<br>

</div>All the time, it&#39;s not really strange, just what happens on large<br>

systems, expecially when developing MPI or applications.<br>

<div class="Ih2E3d"><br>

&gt; I&#39;ll try all the usual things (reduce the optimization level, etc).<br>

&gt; Sage words of advice (and clue sticks) welcome.<br>

<br>

</div>Is it the application which hangs or a combination of the application<br>

and the dataset you give it? &nbsp;What&#39;s the smallest process count and<br>

timescale you can reproduce this on?<br>

<br>

You could try valgrind which works well with openmpi, it will help you<br>

with memory corruption but won&#39;t help be of much help if you have a race<br>

condition. &nbsp;Going by reputation Marmot might be of some use, it&#39;ll point<br>

out if you are doing anything silly with MPI calls, there is enough<br>

flexibility in the standard that you can do something completely illegal<br>

but have it work in 90% of cases, marmot should pick up on these.<br>

<a href="http://www.hlrs.de/organization/amt/projects/marmot/" target="_blank">http://www.hlrs.de/organization/amt/projects/marmot/</a><br>

<br>

We could take this off-line if you prefer, this could potentially get<br>

quite involved...<br>

<font color="#888888"><br>

Ashley Pittman.<br>

</font><div><div></div><div class="Wj3C7c"><br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a><br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</div></div></blockquote></div><br>