[Beowulf] An annoying MPI problem
coutinho at dcc.ufmg.br
Wed Jul 9 18:58:35 EDT 2008
Try disabling shared memory only.
Open MPI shared memory buffer is limited and it enters deadlock if you
As Open MPI uses busy wait, it appears as a livelock.
2008/7/9 Ashley Pittman <apittman at concurrent-thinking.com>:
> On Tue, 2008-07-08 at 22:01 -0400, Joe Landman wrote:
> > Short version: The code starts and runs. Reads in its data. Starts
> > its iterations. And then somewhere after this, it hangs. But not
> > always at the same place. It doesn't write state data back out to the
> > disk, just logs. Rerunning it gets it to a different point, sometimes
> > hanging sooner, sometimes later. Seems to be the case on multiple
> > different machines, with different OSes. Working on comparing MPI
> > distributions, and it hangs with IB as well as with shared memory and
> > tcp sockets.
> Sounds like you've found a bug, doesn't sound too difficult to find,
> comments in-line.
> > Right now we are using OpenMPI 1.2.6, and this code does use
> > allreduce. When it hangs, an strace of the master process shows lots of
> > polling:
> Why do you mention allreduce, does it tend to be in allreduce when it
> hangs? Is it happening at the same place but on a different iteration
> every time perhaps? This is quite important, you could either have a
> "random" memory corruption which can cause the program to stop anywhere
> and are often hard to find or a race condition which is easier to deal
> with, if there are any similarities in the stack then it tends to point
> to the latter.
> allreduce is one of the collective functions with an implicit barrier
> which means that *no* process can return from it until *all* processes
> have called it, if you program uses allreduce extensively it's entirely
> possible that one process has stopped for whatever reason and have the
> rest continued as far as they can until they too deadlock. Collectives
> often get accused of causing programs to hang when in reality N-1
> processes are in the collective call and 1 is off somewhere else.
> > c1-1:~ # strace -p 8548
> > [spin forever]
> Any chance of a stack trace, preferably a parallel one? I assume *all*
> processes in the job are in the R state? Do you have a mechanism
> available to allow you to see the message queues?
> > So it looks like the process is waiting for the appropriate posting on
> > the internal scoreboard, and just hanging in a tight loop until this
> > actually happens.
> > But these hangs usually happen at the same place each time for a logic
> > error.
> Like in allreduce you mean?
> > But the odd thing about this code is that it worked fine 12 - 18 months
> > ago, and we haven't touched it since (nor has it changed). What has
> > changed is that we are now using OpenMPI 1.2.6.
> The other important thing to know here is what you have changed *from*.
> > So the code hasn't changed, and the OS on which it runs hasn't changed,
> > but the MPI stack has. Yeah, thats a clue.
> > Turning off openib and tcp doesn't make a great deal of impact. This is
> > also a clue.
> So it's likely algorithmic? You could turn off shared memory as well
> but it won't make a great deal of impact so there isn't any point.
> > I am looking now to trying mvapich2 and seeing how that goes. Using
> > Intel and gfortran compilers (Fortran/C mixed code).
> > Anyone see strange things like this with their MPI stacks?
> All the time, it's not really strange, just what happens on large
> systems, expecially when developing MPI or applications.
> > I'll try all the usual things (reduce the optimization level, etc).
> > Sage words of advice (and clue sticks) welcome.
> Is it the application which hangs or a combination of the application
> and the dataset you give it? What's the smallest process count and
> timescale you can reproduce this on?
> You could try valgrind which works well with openmpi, it will help you
> with memory corruption but won't help be of much help if you have a race
> condition. Going by reputation Marmot might be of some use, it'll point
> out if you are doing anything silly with MPI calls, there is enough
> flexibility in the standard that you can do something completely illegal
> but have it work in 90% of cases, marmot should pick up on these.
> We could take this off-line if you prefer, this could potentially get
> quite involved...
> Ashley Pittman.
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf