[Beowulf] PBS : deleting jobs that were running on a crashed node

Brent M. Clements bclem at rice.edu
Sun Jan 25 17:56:26 EST 2004


YOu have to remove the job files from both the mom_priv/jobs directory on
the moms that the job was running on as well as the server_priv/jobs
directory on the pbs server.

-B

Brent Clements
Linux Technology Specialist
Information Technology
Rice University


On Sun, 25 Jan 2004, Shriram R wrote:

> Hi,
>
> We have a 24 node/48 procs linux cluster running
> Redhat.  The queueing system that we use is PBS.  One
> of the nodes, node15, conked out completely and is not
> restarting.
>
> "pbsnodes -a" shows the state of the node15 as "down".
>
> However, jobs which had been running on node15 still
> show up when I do a "qstat".
>
> I tried to use "qdel" and "qsig" to delete these jobs,
> but the server complains that it is unable to contact
> pbs_mom, which is obvious since node15 is down.
>
> Can someone tell me how do I delete these jobs from
> the output of "qstat" ?
>
> TIA.
> -shriram
>
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free web site building tool. Try it!
> http://webhosting.yahoo.com/ps/sb/
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list