Environment monitoring
Donald Becker
becker at scyld.com
Wed Oct 1 12:36:26 EDT 2003
On Wed, 1 Oct 2003, Rocky McGaugh wrote:
> On Wed, 1 Oct 2003, Robert G. Brown wrote:
> > Alas, if only somebody would give the lm_sensors folks a copy of a good
> > book on XML for christmas, and they decided to take the monumental step
...
> > then we could ALL reap the fruits of their labor without needing a copy
> > of the lm78 version 1.22a API manual and having to write an application
> > that supports each of the sensors THROUGH THEIR INTERFACE one at a
> > time...;-)
>
> We have that. lm_sensors+cron+gmond.
I think you missed RGB's point. The lm_sensors implementation sucks.
Sure, any one specific implementation can be justified. But having each
implementation use a different output and calibration shows that this
is not an architecture, just a collection of hacks.
The usual reply at this point is "just update the user-level script for
the new motherboard type". Yup... and you should probably update the
constants in your programs' delay loops at the same time.
With lm_sensors you can get a one-off hack working, but cannot implement
a general case. Compare this to IPMI, which presents the same
information. IPMI has a crufty design and ugly implementations, but it
is an architected system. With care you can implement and deploy code
that works on a broad range of current and future machine.
While I'm on the soapbox, gmond deserves its own mini-butane-torch
flame.
I implemented the translator from Beostat (our status/statistics
subsystem) to gmond (per-machine information for Ganglia), so I have a
pretty good side-by-side comparison.
First, how did they choose what statistics to present?
Apparently just because the numbers were there.
What is the point of using a XML DTD if it is just used to
package undefined data types? A wrapper around a wrapper...
Example metric lines:
<METRIC NAME="load_fifteen" VAL="1.41" TYPE="float" UNITS="" TN="246"
TMAX="950" DMAX="0" SLOPE="both" SOURCE="gmond"/>
<METRIC NAME="proc_total" VAL="77" TYPE="uint32" UNITS="" TN="154"
TMAX="950" DMAX="0" SLOPE="both" SOURCE="gmond"/>
Not only are these metric types not enumerated, they are made more
confusing by abbreviations and no definition.
To tie both together: What is "proc_total"?
Number of processors? Number of processes? Does it count system
daemons? It seems to be the useless number "ps x | wc", rather than
the number of end user, application processes.
Many statistics are only usable when used/presented as a set. Why split
the numbers into multiple elements? It just multiplies the size and
parsing load.
____
Background: Beostat is our status/statistics interface that we published
3+ years ago. It exports interfaces at multiple levels:
network protocol,
shared memory table
only for very performance sensitive programs, such as schedulers
dynamic library
the preferred interface for programs
command output
Thus Beostat is a infrastructure subsystem, rather than a single-purpose
stack of programs.
--
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system
Annapolis MD 21403 410-990-9993
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list