Monitoring Software

From Cluster Documentation Project
Jump to: navigation, search
  • Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency.
  • Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes.
  • Clumon performance monitoring system was developed for monitoring Linux-based clusters at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Champaign-Urbana. The system is currently based on Performance Co-Pilot by SGI and the PBS scheduler. It also uses MySQL as its database, and Apache with PHP as its bundled viewer.