Cluster Queuing and Scheduling Packages

Article Index

A brief look at some current options

In our last column, we took a general look at the problem of resource management. This time around, we're going to take a quick stroll through the options for resource management on your cluster. There are a variety of possibilities available, which vary by capability, price, license, and platforms on which they work. Many of the resource management packages available have forked into multiple versions or changed names, leading to a somewhat confusing marketplace for the cluster administrator trying to decide what to do. In this column, we'll trace the origins and history of the popular players, to try and give you some insight into how we got to the state of things today, as the philosophical differences that led to the splits may have a big impact on the features you'll get.

While the total number of choices is large, a few factors will greatly simplify your decision. A couple of packages dominate the world of production solutions. Some of the other options may be just as good, but only work with certain packages and distributions, that you may or may not be using. By the end of this column, you'll have a rough picture of what's out there, and what might work on your cluster. You will be well advised to check the project and vendor web sites for further information as this topic is quite vast and individual needs vary greatly.

The Long and Storied History of PBS

The most well-known and widely used package in the world of cluster is the Portable Batch System, or PBS. Of course it can't be that simple; PBS is a whole family of products. The original PBS was developed by a group that would eventually become Veridian Corporation for the NASA Advanced Supercomputing Division at Ames Research Center. PBS was written to replace the aging NQS (network queuing system) software, which still exists in various incarnations. The last surviving direct descendant is Generic NQS, which is still in use at a number of sites, though active development seems to have stopped.

PBS itself has split into several versions. After it's initial deployment at NASA, PBS, like all generally useful and freely available software packages, began to spread around the community. The team at Veridian concluded there was a market for this type of thing, so they decided to continue to develop PBS into a commercial product, which became PBS Professional. In order not to immediately orphan the existing PBS, they created the OpenPBS project, which would keep the source open and maintained for the version of PBS the community was using, though it wouldn't contain the new features being developed for PBS Pro. Both these versions live on today. In March of 2003, Veridian sold PBS Professional to Altair Engineering.

{mosgoogle right} While releasing a version of OpenPBS was initially a reasonable step for Veridian to take, the OpenSource cluster community soon grew tired of the limitations. In addition, Altair Engineering still controls the OpenPBS source, and are understandably not eager to spend time and effort maintaining community-contributed features that are available in the PBS Professional product. From the Altair perspective, OpenPBS is merely the gateway to PBS Professional. So, while OpenPBS is a solid, albeit limited, product which works well on clusters, it lacks increasingly important features for modern large clusters, such as scalable performance past a few tens of nodes and sophisticated scheduling algorithms (though it still retains some desirable qualities, like a lack of Windows support).

So, the open source community began developing capabilities for OpenPBS that went well beyond it's initial capacity, but Veridian and then Altair had no incentive to incorporate them into the main OpenPBS code. Meanwhile, PBS had become widely adopted, so going an entirely different direction didn't seem feasible. So, a new project was born, which was initially known as Scalable PBS (due to trademark issues, it became known as Storm, and now Torque, the Tera-scale Open-source Resource and QUEue manager). Torque was to be the all open source descendant of PBS, and, as the original name indicated, was to address issues of scalability, as well as performance, fault tolerance, more sophisticated scheduling and scheduling interfaces, and the incorporation of the many patches the community has developed for OpenPBS. While still (and probably perpetually) under active development, Torque is ready for use, reasonably stable, and is developing a fairly wide following.

All three major forks of PBS are still active and in use; PBS Professional as a commercial product making advances in fault tolerance and scalability, OpenPBS as the solid standby in wide use and the default in packages like OSCAR and ROCKS, and Torque as the open source community development platform of choice, and used by a large group of do-it-yourselfers.

PBS (in all incarnations) consists of several components: a server, a scheduler, and the process that runs on all the compute nodes, known as a MOM. The server runs only on the head node, and is the process that actually accepts submission of jobs, maintains the queue of running jobs, and reports when jobs are completed. The MOM, or Machine-Oriented-Miniserver process is fairly lightweight, which is a good thing as a copy of the MOM process must run on every node in your cluster. The MOM interacts with the server to actually run each of your queued tasks on the compute nodes. The scheduler makes decisions about the order of jobs in the queue; most significantly, which job will run next. One of the common features of all versions of PBS is that the scheduler can be replaced with external schedulers containing different scheduling algorithms. The default scheduler in OpenPBS simply employs a first-come, first-served scheduling algorithm. PBS Pro, of course, uses a substantially more sophisticated policy.

Because of the plug-in scheduler feature, the most common way to run PBS is to replace the built-in scheduler with the Maui scheduler. This arrangement is the standard setup in the OSCAR system described last month, for instance. Maui is worthy of a column of it's own, but basically Maui is a high-powered open source stand alone scheduler. Maui focuses on scheduling functionality, and leaves the problems of launching jobs and dealing with users to resource managers like PBS. Maui achieves many of the scheduling goals described in a previous column, through the use of a planning scheduling algorithm that supports reservations for particular jobs, and a backfill mechanism which looks for available space in the planned schedule to squeeze in more jobs. While PBS alone, particularly OpenPBS is not much of a scheduler, it is a very solid resource manager, and the addition of Maui makes for a truly powerful combination.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.