[Beowulf] PBS : deleting jobs that were running on a crashed node
Brent M. Clements
bclem at rice.edu
Sun Jan 25 17:56:26 EST 2004
YOu have to remove the job files from both the mom_priv/jobs directory on
the moms that the job was running on as well as the server_priv/jobs
directory on the pbs server.
Linux Technology Specialist
On Sun, 25 Jan 2004, Shriram R wrote:
> We have a 24 node/48 procs linux cluster running
> Redhat. The queueing system that we use is PBS. One
> of the nodes, node15, conked out completely and is not
> "pbsnodes -a" shows the state of the node15 as "down".
> However, jobs which had been running on node15 still
> show up when I do a "qstat".
> I tried to use "qdel" and "qsig" to delete these jobs,
> but the server complains that it is unable to contact
> pbs_mom, which is obvious since node15 is down.
> Can someone tell me how do I delete these jobs from
> the output of "qstat" ?
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free web site building tool. Try it!
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf