[Beowulf] Re:hardware question: building a cluster node/ student
Lombard, David N
dnlombar at ichips.intel.com
Fri Jul 27 10:44:17 EDT 2007
On Thu, Jul 26, 2007 at 08:48:35AM -0700, David Mathog wrote:
> "Nathan Moore" <ntmoore at gmail.com> wrote
>
> > Earlier this summer, the case fan on one of the machines failed, and the
> > result seems like a cooked motherboard (erratic errors with the integrated
> > NIC).
>
> There should be an automatic shutdown script running to detect
> temperature events and shut down the machine before it is damaged.
> This is what I use on some machines:
>
> ftp://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/sensor_monitor.tar.gz
Depending on the board and kernel, ACPI will also provide these services. On
an FC4 (2.6.14) system, I had to do the following to get that to work:
echo 90 > /proc/acpi/thermal_zone/THRM/polling_frequency
echo 80:0:70:65:0 > /proc/acpi/thermal_zone/THRM/trip_points
The first echo caused the auto shutdown to work; the second set the values I
wanted, i.e., shutdown at 80C. Some ACPI cognescenti said the fact that I
had to "manually enable" the polling/shutdown was an error in that version
of the kernel.
I discovered all this when I came home to that sickening overly-hot electronics
smell, a case *very* hot to the touch, and the CPU at 104C due to a dead CPU
fan. Happily, it took a licking and kept on ticking.
--
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
!DSPAM:46aa04b414366491211187!
More information about the Beowulf
mailing list