semaphore problem with mpich-1.2.5

bvds at bvds.geneva.edu bvds at bvds.geneva.edu
Mon Jul 7 23:13:46 EDT 2003


I have an Opteron system running GinGin64 with 
a 2.4.21 kernel and gcc-3.3.  I compiled
mpich-1.2.5 with --with-comm=shared, but mpirun 
crashes with the error:

 semget failed for setnum = 0

This is a known problem with mpich (see 
http://www-unix.mcs.anl.gov/mpi/mpich/buglist-tbl.html).

Has anyone else seen this error?

I found a discussion, reprinted below, by Douglas Roberts at LANL
(http://www.bohnsack.com/lists/archives/xcat-user/1275.html)
His fix worked for me.  Does anyone know of a "real" solution?

Brett van de Sande

********************************************************************

I think the reason we get sem_get errors is that the operating system is not
releasing inter-process communication resources (e.g. semaphores) when a
job is finished. It's possible to do this manually. ...
I wrote the following script, which removes
all the shared memory and semaphore resources held by the user:

#! /bin/csh

foreach id (`ipcs -m | gawk 'NR>4 {print $2}'`)
        ipcrm shm $id
end

foreach id (`ipcs -s | gawk 'NR>4 {print $2}'`)
        ipcrm sem $id
end

********************************************************************



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list