[Beowulf] Odd SuperMicro power off issues
Bogdan.Costescu at iwr.uni-heidelberg.de
Mon Dec 8 08:29:45 EST 2008
On Mon, 8 Dec 2008, Chris Samuel wrote:
> Very occasionally we find one of our Barcelona nodes with
> a SuperMicro H8DM8-2 motherboard powered off. IPMI reports
> it as powered down too.
> No kernel panic, no crash, nothing in the system logs.
So IPMI still works ? Then this is _not_ like yanking the power cable,
in which case IPMI would not work anymore.
I've seen this exact behaviour (computer is off, IPMI works and
reports that the computer is off) being triggered by computational
loads on SuperMicro H8QC8. I've had several nodes and I was able to
swap power supplies - the problem moved with the power supplies, so
exchanging the "faulty" ones made this behaviour disappear. There is
no Fluent running here, but other codes like Gromacs that are known to
load the system quite well. The power supplies are supposed to deliver
a max. of 1KW for a system with 4 Opteron 875, 8GB RAM and 2 internal
disks. The "turning off" behaviour was also quite random, sometimes
appearing within an hour, sometimes taking hours-days; it has started
to appear about 5-6 months after the nodes were purchased. I still
have one node where this occurs so rarely (about once a month) that
it's not accepted as an excuse for exchange ;-(
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf