Taming 100 nodes of number hungry processors is BIG job. Join Dan Stanzione as he guides you through the issues and challenges of a cluster administrator. And, yes, sleeping through he night as a cluster admin is possible.

A look at the history and state of some cluster distributions

Welcome to Cluster Monkey and the Cluster administration column. The theme of this column is clusters from a system administration point of view. Sure, you can create all kinds of fantastic applications on your cluster, but that doesn't do anyone any good unless you can log in!

The cluster administrator has some unique challenges, that your normal everyday system administrator doesn't face. For starters, you are likely to be responsible for many more machines. While you may have one cluster, that cluster may consist of hundreds of individual nodes. From a hardware perspective, maintaining 512 cluster nodes is just about as hard as maintaining 512 individual workstations. Nothing about putting machines in a cluster makes disks, fans, or power supplies less likely to burn out. From the software perspective, though, running a 512 node cluster can be a lot different than running 512 workstations. There are really two facets to cluster administration, things you must do on a per-node basis, like fixing disk drives, and things you must do only once on a per-cluster basis, such as (hopefully) adding new users. The goal of your cluster distribution or management software, and your goal as an administrator, is to turn as many per-node administration tasks into per-cluster tasks as possible.


