[Beowulf] Re: PowerEdge SC 1435: Unexplained Crashes.

Rahul Nabar rpnabar at gmail.com
Fri Oct 10 10:54:35 EDT 2008

>Have you checked in the baseboard management log to see if it is
>throwing an error.

Apparently the SC1435 does not have OpenManage. "Simple Computing" is
too simple to warrant that, I was told. They do have dset to look at
the  ESM logs but not for CentOS nor Fedora. Redhat is their
"validated" [sic] OS. That's the only one they support. So I'm sort of
stuck there.

> Also check on the temperature of the machines.  We
>have had some pretty wierd issues with ram and CPU quirkyness when
>they reach a high internal temperature.  If you can do some poling
>using ipmi on the nodes to record the current temp and fan data over
>time so that you could see what it was at just before a crash you
>might be able to point it to an environmental situation.

I'll try ipmi. I was trying lm_sensors but apparantly it does not have
a driver for this chipset / motherboard combination. Not sure if its
an AMD Opteron specific driver issue or a
vendor-not-relesing-motherboard-specs issue (heard both versions on
the net). Anybody else had success using lm_sensors on the SC1435?

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list