[Beowulf] Node Drop-Off
gerald.davies at gmail.com
Sun Nov 12 16:40:36 EST 2006
On 11/12/06, Tim Moore <twm at tcg-hsv.com> wrote:
> Hello All -
> I have a compute node that has started dropping off. When I say drop
> off, I mean the node (while running a job) will lose all connectivity
> and the machine does not respond. I have viewed the logs and can find
> no reason for the node to cease functioning. Let me state that this
> behavior did not occur until after a processor upgrade, BIOS upgrade and
> OS upgrade. I went in to the BIOS and made a few changes that seemed to
> prolong it even though its occurrence was mostly random. If I leave the
> node idle, it will run for days.
> Has anyone ever seen such behavior?
seen that with faulty hardware, but then you've changed a few things.
if you're sure it's not code or the OS then just take another spare
node and try out the different things you've changed processor, bios,
memory (?), step by step.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf