Administration
Taming 100 nodes of number hungry processors is BIG job. Join Dan Stanzione as he guides you through the issues and challenges of a cluster administrator. And, yes, sleeping through he night as a cluster admin is possible.
- Details
- Written by Dan Stanzione
- Hits: 45916
Beyond NFS, but not to far
OK, OK, so there are plenty of File System articles on Cluster Monkey. But, from an administrators perspective, no good discussion of cluster administration can continue without coming to that thorniest of issues, file systems and I/O. In the long history of parallel computing, I/O in most cases could have stood for Ignored/Oops! instead of Input/Output. In the more recent history of Beowulf clusters, I/O has finally received some quality attention by some quality people, but there is still no "silver bullet" file system, and providing the right kind of storage for the right job remains one of the biggest headaches for cluster administrators.
- Details
- Written by Dan Stanzione
- Hits: 19762
We will even throw in a torque wrench for free
Over the last couple of columns, we've done a broad survey of the scope and history of resource management. In this column, we're going to dive a little more in depth into two of the leaders: PBS (aka Torque) and LSF.
PBS, LSF, and most of the other resource management packages described here over the last couple of months require jobs to be submitted via a shell script (even for the few that don't require it, you are probably better off if you do). This requirement can be a daunting task for users, particularly those who are not Linux savvy. However, it doesn't have to be.
- Details
- Written by Dan Stanzione
- Hits: 14085
A brief look at some current options
In our last column, we took a general look at the problem of resource management. This time around, we're going to take a quick stroll through the options for resource management on your cluster. There are a variety of possibilities available, which vary by capability, price, license, and platforms on which they work. Many of the resource management packages available have forked into multiple versions or changed names, leading to a somewhat confusing marketplace for the cluster administrator trying to decide what to do. In this column, we'll trace the origins and history of the popular players, to try and give you some insight into how we got to the state of things today, as the philosophical differences that led to the splits may have a big impact on the features you'll get.
- Details
- Written by Dan Stanzione
- Hits: 12621
A Primer of Queuing and Scheduling
After a few months of reading about clusters on Cluster Monkey, you've gotten your cluster hardware, assembled it, selected a distribution, installed it, checked to make sure your nodes are up and running, learned some parallel programming, and written the highest performance "Hello, World" the universe has ever seen. So, you figure you're ready to turn your machine loose to your users, and amazing jumps in productivity are about to be seen.
In the immortal words of ESPN's Lee Corso, Not so fast, my friend!
Unless you're fortunate enough to be your cluster's only user (the ever growing Beowulf-in-the-basement crowd), selecting and configuring the appropriate resource management software is an absolutely critical step in administering your cluster. Outside of your choice of hardware, few things you do will have as big an impact on the user experience and performance your cluster provides.
- Details
- Written by Douglas Eadline
- Hits: 12125
It's all about the Process
There are two predominant software approaches for to Linux clustering. The first is the "Classic Beowulf" where each system has a copy of a complete operating system (more or less). In this approach parallel processes are started remotely on worker nodes. The second method is based on a single system image (SSI) concept where worker nodes have a minimal amount of software installed and processes are migrated from the host node. There is a big difference.