Beowulf and Big Brother

Leif Nixon nixon at
Wed Nov 13 04:51:58 EST 2002

canon at writes:

> We are using netsaint/nagios to monitor our cluster (a little over
> 300 nodes). Netsaint works well for monitoring services and basic
> host responds. [...] We just recently started using ganglia to
> monitor performance.

Sadly, there doesn't seem to be a good way of getting Nagios to
monitor data from Ganglia. You can feed the data into Nagios as
passive service checks, which sort of works, but you can't do passive
host checks.

So, if you have several clusters and want Nagios to notify you if a
node dies, you need to set up Nagios in a distributed configuration,
with a Nagios server on each cluster's front-end. That really is a
pain, since you have to duplicate much of the configuration between
the central Nagios server and the distributed ones. Or rather, you
need to duplicate it *and* subtly change it. I started to write
scripts to do this in an automated fashion, but after a while threw my
hands up in disgust. No fun.

I'm having some thoughts about hacking monitoring abilities into
Ganglia, but haven't gotten around to actually doing anything about it

Leif Nixon                                    Systems expert
National Supercomputer Centre           Linkoping University
Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list