[Beowulf] Re: PowerEdge SC 1435: Unexplained Crashes.
rpnabar at gmail.com
Fri Oct 10 10:54:35 EDT 2008
>Have you checked in the baseboard management log to see if it is
>throwing an error.
Apparently the SC1435 does not have OpenManage. "Simple Computing" is
too simple to warrant that, I was told. They do have dset to look at
the ESM logs but not for CentOS nor Fedora. Redhat is their
"validated" [sic] OS. That's the only one they support. So I'm sort of
> Also check on the temperature of the machines. We
>have had some pretty wierd issues with ram and CPU quirkyness when
>they reach a high internal temperature. If you can do some poling
>using ipmi on the nodes to record the current temp and fan data over
>time so that you could see what it was at just before a crash you
>might be able to point it to an environmental situation.
I'll try ipmi. I was trying lm_sensors but apparantly it does not have
a driver for this chipset / motherboard combination. Not sure if its
an AMD Opteron specific driver issue or a
vendor-not-relesing-motherboard-specs issue (heard both versions on
the net). Anybody else had success using lm_sensors on the SC1435?
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf