Administration

Taming 100 nodes of number hungry processors is BIG job. Join Dan Stanzione as he guides you through the issues and challenges of a cluster administrator. And, yes, sleeping through he night as a cluster admin is possible.

Beyond NFS, but not to far

OK, OK, so there are plenty of File System articles on Cluster Monkey. But, from an administrators perspective, no good discussion of cluster administration can continue without coming to that thorniest of issues, file systems and I/O. In the long history of parallel computing, I/O in most cases could have stood for Ignored/Oops! instead of Input/Output. In the more recent history of Beowulf clusters, I/O has finally received some quality attention by some quality people, but there is still no "silver bullet" file system, and providing the right kind of storage for the right job remains one of the biggest headaches for cluster administrators.

We will even throw in a torque wrench for free

Over the last couple of columns, we've done a broad survey of the scope and history of resource management. In this column, we're going to dive a little more in depth into two of the leaders: PBS (aka Torque) and LSF.

PBS, LSF, and most of the other resource management packages described here over the last couple of months require jobs to be submitted via a shell script (even for the few that don't require it, you are probably better off if you do). This requirement can be a daunting task for users, particularly those who are not Linux savvy. However, it doesn't have to be.

A brief look at some current options

In our last column, we took a general look at the problem of resource management. This time around, we're going to take a quick stroll through the options for resource management on your cluster. There are a variety of possibilities available, which vary by capability, price, license, and platforms on which they work. Many of the resource management packages available have forked into multiple versions or changed names, leading to a somewhat confusing marketplace for the cluster administrator trying to decide what to do. In this column, we'll trace the origins and history of the popular players, to try and give you some insight into how we got to the state of things today, as the philosophical differences that led to the splits may have a big impact on the features you'll get.

A Primer of Queuing and Scheduling

After a few months of reading about clusters on Cluster Monkey, you've gotten your cluster hardware, assembled it, selected a distribution, installed it, checked to make sure your nodes are up and running, learned some parallel programming, and written the highest performance "Hello, World" the universe has ever seen. So, you figure you're ready to turn your machine loose to your users, and amazing jumps in productivity are about to be seen.

In the immortal words of ESPN's Lee Corso, Not so fast, my friend!

Unless you're fortunate enough to be your cluster's only user (the ever growing Beowulf-in-the-basement crowd), selecting and configuring the appropriate resource management software is an absolutely critical step in administering your cluster. Outside of your choice of hardware, few things you do will have as big an impact on the user experience and performance your cluster provides.

It's all about the Process

There are two predominant software approaches for to Linux clustering. The first is the "Classic Beowulf" where each system has a copy of a complete operating system (more or less). In this approach parallel processes are started remotely on worker nodes. The second method is based on a single system image (SSI) concept where worker nodes have a minimal amount of software installed and processes are migrated from the host node. There is a big difference.

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.