OpenPBS under RH7?

Roger L. Smith roger at ERC.MsState.Edu
Thu Apr 26 10:46:16 EDT 2001


Hello Folks,

We've got a pretty good sized Linux cluster that we are finally about to
release to our users.  During all of our testing and benchmarking, we've
been using RedHat 7.0 and the 2.4.2 kernel.  We've worked out most all of
the problems until we installed OpenPBS (v2.3.12).

Our test users have been experiencing a problem where the pbs_mom daemon
on the first node on a given job will die.  It typically seems to die
either when the job finishes, or when the user tries to qdel the job.
It's not consistent, however.  One of our users estimates that it fails 2
out of 5 times.  The corrective action for this includes deleting
everything out of /var/spool/PBS/mom_priv/jobs/<jobnum*> and restarting
the daemon on the node.

This is becoming a very serious issue for us, and may require that I
downgrade the entire cluster to RedHat 6.2 and/or a 2.2 kernel.  I'm not
anxious to do this for several reasons.

Is anyone on this list running a similar configuration, or have any
experience with this problem?

 _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
| Roger L. Smith            Phone:662-325-3625      roger at ERC.MsState.Edu |
| Systems Administrator     FAX:  662-325-7692 WWW.ERC.MsState.Edu/~roger |
|-------------------------------------------------------------------------|
|         Mississippi State University/National Science Foundation        |
|______Engineering Research Center for Computational Field Simulation_____|





_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list