Performance monitoring tool

Robert G. Brown rgb at phy.duke.edu
Wed Aug 6 13:21:02 EDT 2003


On Wed, 6 Aug 2003, White, Adam Murray wrote:

> Hello,
> 
> I am interested in acquiring a good real time cluster performance monitoring tool, which at 
> least displays (dynamically while the program is running) each thread's cpu utilization and 
> memory usage (graphically). Not a postmortem display. Free as well.
> 
> Any help would be much appreciated.

At this time it won't QUITE do what you like, but it is within spitting
distance of it.  Check out:

  xmlsysd

and

  wulfstat

on brahma (http://www.phy.duke.edu/brahma).

xmlsysd is a daemon that runs on a cluster and obtains by a variety of
means statistics of interest on the system.  Some of these it parses
from proc, others by the use of systems calls.  It is not promiscuous
(it doesn't provide e.g. a complete copy of /proc to clients that
connect to it) but rather offers a digested view that can be throttled
so that one or more "sets" of interesting statistics can be monitored.
This is to keep it lightweight, both on the system it is monitoring and
on the network and client -- it is (literally) a parallel application in
its own right and it isn't a good idea for a monitor application to
significantly compete for any of the resources that might bottleneck a
"production" parallel application.

Its "prepackaged" return sets include load avg (5,10,15 min), memory
(basically the data underlying the "free" command), ethernet network
usage for one or more devices, date/time/cpu information, basically the
kind of data one finds digested at the top of the "top" command or made
available by e.g xosview in kin in graphical windows.

It also has a "pid" mode where it can monitor running processes.  Here
throttling and filtering is a bit trickier, as one generally does NOT
want to monitor every process running on a system with a supposedly
lightweight tool.  I thus implemented pid selection by means of matching
task name or user name, a mode that returns all "userspace" tasks that
have accumulated more than some cutoff in total time (5 seconds?  I
can't remember), as well as a to-be-rarely-used promiscuous mode that
returns everything it can find including root tasks.

xmlsysd's returns are in xml, and hence are easy to parse out with any
xml parser for application in anything you like.

That's the good news.

The other good news is that wulfstat, the provided client, lets you use
most of these features in a tty/ncurses window.

The bad news it that there is no GUI display with little graphs and the
like.  This is mixed news, really, not necessarily bad.  A tty display
lets you use the pgup and pgdn keys and scroll arrows to page quickly
through a lot of hosts, seeing instantly the full detail (actual
numbers) for each field being monitored -- you might find wulfstat to be
adequate.  If it isn't adequate, though, you'll likely need to write
some sort of client application that polls the daemon at some interval
(I tend to use 5 seconds as the default, but it can be set up or down as
low as 1 second, depending on how many hosts one wishes to monitor,
again remembering that it is supposed to be lightweight and that it is a
bad idea to run it so fast that the return latency causes the loop to
pile up).

This should be pretty easy -- you can actually talk to the daemon with
telnet, so watching it work and testing the api is not a problem.
You've got wulfstat sources to play with (both tools fully GPL).  The
daemon returns XML, which is easy to parse out.  Finally, there are a
fair number of tools or libraries that you can pipe this output into to
generate graphs, either on the web or some other console. One day I'll
actually write such a tool myself, but wulfstat proved so adequate for
most of what we use it for that I haven't been able to justify advancing
the project to the top of the triage-heap of bloody and neglected
projects that fill my life:-).  If you do write one, feel free to do so
collaboratively and donate it back to the project so we can all share,
although of course the GPL wouldn't require this as far as I can see for
clients not derived from wulfstat code or that you write for yourself.

xmlsysd and wulfstat have been in "production" use locally for some
time, but they are still probably beta level code because most people
use ganglia with its web-based displays.  Personally I think
xmlsysd/wulfstat provide a pretty rich set of monitor options (and
actually is derived from code I originally wrote and was using somewhat
before the ganglia project was begun, so I can't be accused of foolishly
duplicating an existing project:-).  If you have any problems with them
I will cheerfully fix them, and if you have any ideas for additions or
improvements that wouldn't drive me mad timewise to implement, I was
cheerfully add them.

   rgb

> 
> Regards,
> A. M. White
> 
> ######################################################
> Adam M. White
> University of New Brunswick Saint John
> http://www.unbsj.ca/sase/csas
> m0ukb at unb.ca
> ######################################################
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list