[Beowulf] Scyld system mysteriously locks up
agrajag at dragaera.net
Tue Mar 16 13:49:02 EST 2004
On Mon, 2004-03-15 at 14:49, Eric R Johnson wrote:
> I purchased a 4 node, 8 processor Scyld (version 28) cluster
> approximately 6 months ago. About 5 days ago, it started mysteriously
> locking up on me. Once it is locked up, I can't do anything except
> physically reboot the machine.
> Unfortunately, I am rather new to Linux clusters and, since it worked
> "right out of the box", I have had no experience in troubleshooting.
> Can someone give me an idea of where I should start?
> I have the BIOS on all machines set to do a full memory check on startup
> and the /var/log/message file shows nothing.
It might be useful to try to figure out what is locking up. Is it just
the head node that's locking?
Have you made any recent changes that might account for it? Or are you
running any new programs that might be stressing the machine in a way it
wasn't stressed before? If its completely locking (if you can no
longer toggle the numlock light on your keyboard, then its completely
locked), then its either a kernel hang, or a hardware issue. If the
kernel is the same and the usage pattern hasn't changed, then it might
be a hardware issue. Hardware can degrade over time and dying hardware
can be unpredictable.
You may also consider contacting Scyld, and possibly the hardware
manufacturer for help diagnosing the problem.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf