[Beowulf] Re: [Linux-HA] Couldn't get watchdog to work

Alex Vrenios alex at dsrlab.com
Tue Dec 28 12:14:16 EST 2004


> -----Original Message-----
> Paul Chen wrote:
> > Both nodes did restart 
> > heartbeat but none of them reboot or shut down. Am I doing 
> > something wrong?
> >
> Alan Robertson wrote:
> The watchdog timer will only kill the system if heartbeat goes insane.
> It didn't.  So, the watchdog timer is happy.
> 
> At this point in time, the watchdog timer is not a 
> replacement for a STONITH device.
>
Which is exactly what I am looking into (the STONITH device)...

I see two solutions, one hardware and one software. The hardware solution
looks expensive, but I believe the software solution will help Mr. Chen
(above), and would appreciate comments.

I would have my "backup" system execute a command as part of its attempts to
assume the identity, responsibilities and resources of the "primary" system.
The command is run from backup, as follows:

   root at backup> ssh root at primary shutdown -h now

This will not work in all cases, but it should work in cases like the above.
A hardware solution is more general, but it doesn't hurt to run this command
in any case.

Alex Vrenios
DSRLab


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list