[Beowulf] Node Drop-Off
hahn at physics.mcmaster.ca
Mon Nov 13 00:15:19 EST 2006
> I have a compute node that has started dropping off. When I say drop off, I
> mean the node (while running a job) will lose all connectivity and the
> machine does not respond. I have viewed the logs and can find no reason for
> the node to cease functioning.
if you connect a console to such a node, is it simply panic'ed?
> Has anyone ever seen such behavior?
I have the occasional node which turns itself off under load.
the IPMI reports power being off, so it's distinct from panics.
the IPMI system-error-log doesn't show any reason.
we (and the vendor) regard this as grounds for repair (usually
the power supply).
regards, mark hahn.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf