[Beowulf] query: aggregate cluster performance monitoring without multicast

Guy Coates gmpc at sanger.ac.uk
Fri Jan 9 04:54:43 EST 2004

> --__--__--
> > I am in the process of trying to get a stopgap perfomance monitoring
> > system going on a 64 CPU Linux cluster with LSF.

Just use LSF's own built in data monitoring. "lsload -l" will give you the
load, memory, swap usage etc of every node in the cluster. You can extend
the data collection with your own scripts (elim) to measure whatever else
you want. For example, we monitor temperature and some database load
information as well as the usual machine statistics.

Data collection is done via a very simple perl script which collects
lsload info at regular intervals, and then splats the data into rrdtool
(www.rrdtool.com) for storing the historical data and producing graphs.


Guy Coates.

Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list