Environment monitoring

Robert G. Brown rgb at phy.duke.edu
Wed Oct 1 10:33:29 EDT 2003


On Wed, 1 Oct 2003, Leopold Palomo wrote:

> Ok. I was a bit surprise about your sentence. I know that lmsensors is not 
> perfect, but it does their job. Ok, I don't think that use lm_sensors to try 
> to calculate the T of the room is a bit excesive.
> 
> About the xml,... well, ok, it would be a nice feature, but as plain text, 
> knowing your hardware it's so good, too.
> 

Sorry, I tend to get distracted and rant from time to time (even though
as Greg noted, sometimes the rants are of lesser quality:-). In this
particular case the rant is really directed to all of /proc, but the
sensors interface is the worst example of the lot.

I'm "entitled" to rant because I've written two tools (procstatd and
xmlsysd) that parse all sorts of data, including sensors data in
procstatd, out and provide it to clients for monitoring purposes.  Even
my daemon wasn't the first to do this, but I think it was one of the
first two that functioned as a binary without running a shell script or
the like on each node.  procstatd actually predated ganglia by a fair
bit, FWIW.

On the basis of this fairly extensive experience I can say that
lmsensors output is very poorly organized from the perspective of
somebody trying to write a general purpose parser to extract the data it
provides.  In particular, it uses a directory tree structure where the
PARTICULAR sensors interface that you have appears as part of the path,
and where what you find underneath that path depends on the particular
sensor that you've got as well.

Hopefully it is obvious how Evil this makes it from the point of view of
somebody trying to write a general purpose tool to parse it.  Basically,
to write such a tool one has to go through the lmsensors sources and
reverse engineer each interface it supports to determine what is
produced and where, one at a time.  This is more than slightly nuts.
What do "most" sensors provide?  Fields like cpu temperature (for cpu's
0-N), fan speed (for fans 0-N), core voltage (for lines 0-N).  

Sure, some provide more, some provide less, but what are we discussing?
The monitoring of cpu temperature, under the reasonable assumption that
either we have a sensor that provides it or we don't, and that we really
don't give a rodent's furry touchis WHICH sensor we have as long as it
gives us "CPU Temperature", preferrably for every CPU.

So a good API is one that has a single file entitled /proc/sensors, and
in that file one finds things like:

<?xml version="1.0"?>
<sensors>
  <cpu_temperature id="0" units="C">54.2</cpu_temperature>
  <cpu_temperature id="1" units="C">51.7</cpu_temperature>
  <fan_speed id="0"....

  <hardware>
    <type>lm78</type>
    <blah>...</blah>
    <blah>...</blah>
  </hardware>
</sensors>

I can write code to parse this in a few minutes of work, literally, and
the same code will work for all interfaces that lm_sensors might
support, and I don't need to know the interface the system has in it
beforehand (although with the knowledge I might add some advanced
features if it supports them).  Presenting the knowledge is also trivial
-- a web interface might be as sparse as a reader/parser and/or a DTD.

Compare to parsing something like (IIRC)

  /proc/sensors/device-with-a-bunch-of-numbers/subunit/field

where the path that you find under specific devices-with-numbers depends
on the toplevel value on a device by device basis and the contents of
field can as well.  Yech.

And Rocky, hiding the problem with gmond is fine, but then it puts the
burden for writing an API for the API on the poor people that have to
support the gmond interface.  Yes they can (and I could) do this.  I
personally refuse.  They obviously have gritted their teeth and done so.
The correct solution is clearly to redo the lm_sensors interface itself
so that it is organized as the above indicates.

Which criticism, by the way, applies to a LOT of /proc, which currently
looks like it was organized by a bunch of wild individualists who have
handled every emergent subfield by overloading its data in a single
"field" line, usually with documentation only in the form of reading
procps or kernel source.  Just because this is actually true doesn't
excuse it.  Parsing the contents of /proc is maddening for just this
reason, and the cost is a lot of needless complexity, pointless bugs and
upgrade incompatibilities for many people.  Putting the data into
xml-wrapped form would be a valuable exercise in the discipline of
structuring data, for the most part.

   rgb

> Best Regards.
> 
> Pd How about the pdf, ps, etc?

I'll try to work on this as soon as I can.  My task list for the day
looks something like a) debug/fix some dead nodes; b) add a requested
feature/view to wulfstat (that has been on hold for a week or more:-(,
c) work on a bunch of documents associated with teaching and curriculum
at Duke (sigh); d) about eight more tasks, none of which I will likely
get to, including work on my research.

However, this is about the third or fourth time people have requested
a "fix" for the ps/pdf/font issue (with acroread it can even fail
altogether to read the document -- presumably some gs/acrobat
incompatibility where I use gs-derived tools) so I'll try very hard to
craft some sort of fix by the weekend.

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list