[Beowulf] mpich program segfaults

Douglas Eadline, Cluster World Magazine deadline at linux-mag.com
Thu Mar 4 21:46:16 EST 2004


Don't give up on  the G5 just yet.

Sounds like to me you may be stepping on some memory somehow. Which means
the crash occurs at that particular spot in the code, but the cause of the
crash probably is occurring somewhere else in the program.

There are "simple" several you can do to collect evidence that may help
you solve this "crime". (this is detective work by the way)

First, this sounds like the kind of thing that happens in C programs. Is
it pure Fortran? What version of MPICH?

1) try another compiler, if you are lucky it will find the problem. It may
also work, in which case you will want to blame the first compiler, don't,
because that is probably not the case. The new compiler probably lays out
the memory different than the first one and you just got lucky.

2) run your code on another architecture.

3) try another MPI (LAM?)

I am sure there are more, but not knowing the particulars, I can not
suggest anything else.

Doug



On Thu, 4 Mar 2004, Glen Kaukola wrote:

> Douglas Eadline, Cluster World Magazine wrote:
> 
> >What type of machine is this?
> >  
> >
> 
> An Apple G5.
> 
> And actually I've figured out what's wrong.  Sorta.  =)
> 
> I replaced my problematic subroutine with a dummy subroutine that 
> contains nothing but variable declarations and a print statement.  This 
> still caused a segmentation fault.  So I commented pretty much 
> everything out.  No segmentation fault.  Alright then.  I slowly added 
> it all back in, checking each time to see if I got a segmentation fault.
> 
> And now I'm down to 4 variable declarations that are causing a problem:
> REAL          ZFGLURG   ( NCOLS,NROWS,0:NLAYS )
> INTEGER      ICASE( NCOLS,NROWS,0:NLAYS )
> REAL         THETAV( NCOLS,NROWS,NLAYS )
> REAL         ZINT  ( NCOLS,NROWS,NLAYS )
> 
> If I uncomment any one of those, I get a segmentation fault again.
> 
> But it still doesn't make any sense.  First of all, there are variable 
> declarations almost exactly like the ones I listed and those don't cause 
> a problem.  I also made a small test case that called my dummy 
> subroutine and that worked just fine.  I then commented out everything 
> but the problematic variable declarations I listed above and that worked 
> just fine.  I tried changing the variable names but that didn't seem to 
> make a difference, as I still got a segmentation fault.  So I have no 
> idea what the heck is going on.  I think I need to tell my boss we need 
> to give up on G5's.
> 
> 
> Glen
> 

-- 
----------------------------------------------------------------
Editor-in-chief                   ClusterWorld Magazine
Desk: 610.865.6061                            
Cell: 610.390.7765         Redefining High Performance Computing
Fax:  610.865.6618                www.clusterworld.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list