[Beowulf] Re: after update sgeexecd not starting correctly on reboot

David Mathog mathog at caltech.edu
Tue Nov 25 19:08:15 EST 2008

> I think  maybe the NFS mounting is different, so that the remote_fs
> prerequisite isn't really satisfied, even though the associated script
> has run.  The sgeexecd script does include a test:
> while [ ! -d "$SGE_ROOT" -a $count -le 120 ]; do
>    count=`expr $count + 1`
>    sleep 1
> done

This seems to have been it.  Changing "$SGE_ROOT" to "$SGE_ROOT/bin"
let SGE came up ok in a couple of consecutive reboots.  Not definitive
proof that was the issue, but at least it seems like progress. 
Apparently it was getting to this part of the SGE init script before
$SGE_ROOT was actually mounted, the -d test always passed, NFS
mounted or not, and of course the SGE start up failed since none of that
code from the remote system was reachable.  Just for kicks I added an
echo line within the loop, so that if it sticks there it will show
up on the console.


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the Beowulf mailing list