semaphore problem with mpich-1.2.5
bvds at bvds.geneva.edu
bvds at bvds.geneva.edu
Mon Jul 7 23:13:46 EDT 2003
I have an Opteron system running GinGin64 with
a 2.4.21 kernel and gcc-3.3. I compiled
mpich-1.2.5 with --with-comm=shared, but mpirun
crashes with the error:
semget failed for setnum = 0
This is a known problem with mpich (see
http://www-unix.mcs.anl.gov/mpi/mpich/buglist-tbl.html).
Has anyone else seen this error?
I found a discussion, reprinted below, by Douglas Roberts at LANL
(http://www.bohnsack.com/lists/archives/xcat-user/1275.html)
His fix worked for me. Does anyone know of a "real" solution?
Brett van de Sande
********************************************************************
I think the reason we get sem_get errors is that the operating system is not
releasing inter-process communication resources (e.g. semaphores) when a
job is finished. It's possible to do this manually. ...
I wrote the following script, which removes
all the shared memory and semaphore resources held by the user:
#! /bin/csh
foreach id (`ipcs -m | gawk 'NR>4 {print $2}'`)
ipcrm shm $id
end
foreach id (`ipcs -s | gawk 'NR>4 {print $2}'`)
ipcrm sem $id
end
********************************************************************
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list