Environment monitoring

Robert G. Brown rgb at phy.duke.edu
Wed Oct 1 08:37:44 EDT 2003


On Wed, 1 Oct 2003, Leopold Palomo Avellaneda wrote:

> A Dimarts 30 Setembre 2003 22:23, Rocky McGaugh va escriure:
> > Dont overlook lm_sensors+cron
> >
> Why?

On a system equipped with an internal sensor, lm_sensors can often read
e.g. core CPU temperature on the system itself.  A polling cron script
can then read this and take action, e.g. initiate a shutdown if it
exceeds some threshold.

There are good and bad things about this.  A good thing is it addreses
the real problem -- overheating in the system itself -- and not room
temperature.  CPU's can overheat because of a fan failure when the room
remains cold, and a sensors-driven poweroff can then save your hardware
on a node by node basis.

The bad thing is that it does NOT give you any sort of measure of room
temperature per se, although if you have the poweroff script send you
mail first, getting deluged with N messages as the entire cluster shuts
down would be a good clue that your room cooling failed:-).  Also,
lm_sensors has the API from hell.  In fact, I would hardly call it an
API.  One has to pretty much craft a polling script on the basis of each
supported sensor independently, which requires you to know WAY more than
you ever wanted to about the particular sensor your system may or may
not have.

Alas, if only somebody would give the lm_sensors folks a copy of a good
book on XML for christmas, and they decided to take the monumental step
of converting /proc/sensors into a single xml-based file with the
RELEVANT information presented in toplevel tags like 

  <cpu_temp id="0" units="C">50.4</cpu_temp> 

and the irrelevant information presented in tags like

  <hardware><name>lm78</name><version>1.22a</version></hardware>

then we could ALL reap the fruits of their labor without needing a copy
of the lm78 version 1.22a API manual and having to write an application
that supports each of the sensors THROUGH THEIR INTERFACE one at a
time...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list