Hits: 17418

We will even throw in a torque wrench for free

Over the last couple of columns, we've done a broad survey of the scope and history of resource management. In this column, we're going to dive a little more in depth into two of the leaders: PBS (aka Torque) and LSF.

PBS, LSF, and most of the other resource management packages described here over the last couple of months require jobs to be submitted via a shell script (even for the few that don't require it, you are probably better off if you do). This requirement can be a daunting task for users, particularly those who are not Linux savvy. However, it doesn't have to be.

While both PBS and LSF have a large and powerful set of options, most users and most jobs do not require this capability. In a typical scenario, a user probably only a cares about a few characteristics about a job: the name of the program to run, how many processors it will run on, where to get it's input data, where to put it's output data, and how they find out when it's done. A job script will also frequently require some information the user doesn't typically want to care about, some limits on the resources the job can consume (most often, anticipated running time).

Since the typical job only needs to convey these few things to the resource manager, a good practice is to prepare a few template scripts for users that cover the common cases. Provide your expertise on the more complicated ones, rather than try and instruct all users on all cases (a hopeless task; your average user will simply not get as fired up about the queuing system as you will. If they did, they wouldn't need you to administer the cluster for them in the first place).

The Scripts

The listings below are some bare bones scripts for a very simple case, assuming you are using only a single queue. Each script handles the same job, one on PBS, and one on LSF. Note that although these are fairly different packages to administer, the user view is fairly similar.

In Listing One, you will find a PBS sample job script. Listing Two is an equivalent script, but one that will run in the LSF environment. Both scripts start out as normal UNIX shell scripts, meaning you have to begin them with the characters #! and the path to the shell you want them to execute in. In this case, I've chosen /bin/csh for both scripts, but you may use the shell of your choice.

In typical shell scripts, lines beginning with a # are considered comments, with the exception of the #! first line. In both PBS and LSF, there is an additional exception. Lines beginning with #PBS (for PBS) or #BSUB (for LSF) are directives to the resource manager to describe how this job is to be handled (for PBS, the directive string is actually programmable within the script). In the case of both PBS and LSF, these directives can be passed directly on the command line when the job is submitted to a queue. Any remaining lines that begin with # are still treated as comments.

Though it isn't a requirement, it's good practice to give every job a name. This task is done with a directive in both cases. Line 3 in Listing One shows how to do this in PBS, with a directive with the -N option, followed by the name. Line 3 in Listing Two does the same job in LSF, with a directive with a -J option. A requirement for jobs in both systems is to tell the resource manager where to stick output and errors. In a batch system, things that would normally be printed to the screen (if you had run the job interactively) must be captured in files. Lines 5 and 6 in Listing One show how to redirect the standard output and error from your job into files of your choosing, in this case, the files samplejob.out and samplejob.err. Lines 4 and 5 in Listing Two do the exact same thing, but make use of a LSF feature that allows you to reference the job name (once you define it) through the variable %J; so the filenames produced in Listing Two will be the same as in Listing One.

 1  #!/bin/csh
 2  ### Job name
 3  #PBS -N samplejob
 4  ### Output
 5  #PBS -o samplejob.out
 6  #PBS -e samplejob.err
 7  ### Queue name
 8  #PBS -q workq
 9  #PBS -M This email address is being protected from spambots. You need JavaScript enabled to view it.
10  #PBS -l nodes=32,walltime=0:15
11  #PBS -m be
12  ### Script commands
13  echo "Job Starting..."
14  echo "Submitted from Directory: $PBS_O_WORKDIR"
15  my_job args
16  exit

Line 8 of Listing One selects the queue to which you wish to submit your job, in this case one called workq (the default queue if you use OSCAR). If you are running a cluster with only a few users, you may only want a single queue. Larger sites typically have multiple queues, to separate large from small jobs or users of different priority. The LSF and PBS syntax are the same here (see line 6 of Listing Two), using a directive with the -q option, followed by the name of the queue. You can determine which queues exist on your system using the qstat (PBS) or bqueues (LSF) commands.

Next, you need to specify the resources you want your job to use. Either queuing system can limit your resources in a number of ways; such as by number of processes, by total wall clock time, or by memory usage. In our example, we'll limit our job in two ways, to 32 processors, and to 15 minutes of wall clock time. LSF has directives for each type of limit. In Listing Two, line 2 handles the node limit, and line 7 handles the wall clock time limit. In PBS, all limits are set with the -l option, and are placed in a comma separated list, as shown in line 10 of Listing One.

You probably will also want to place in your template script options for notifying the user by e-mail when something happens with your job, such as start of execution, abort, or completion. You can configure the list of e-mail addresses used, and the events that result in a mail message being generated. The sample scripts mail a single user at the beginning and end of job execution. For PBS, one directive sets up the e-mail address list (line 9), and a second directive sets all notification options (line 11). The be stands for beginning and end; you can use b,e, or a (abort), or any combination of the three. LSF uses separate directives for each type of notify (see lines 8,9, and 10); -u sets the user e-mail list, -B sends mail at the beginning of the job, and -N notifies you upon completion. The completion report will also contain some statistics about the job, such as how long it actually executed.

Finally, once you've finished all your directives, your script needs to actually run your job. You can simply place the commands in the script that would use to normally run the job from the command line. You can also run any other sequence of shell commands and the output will be captured in your standard output file. Using the echo command to place some labels in your file is typically helpful here. In this case, in line 14 of Listing One and line 12 of Listing Two, the directory the command was run from will be placed in the file (note the different environment variables for PBS vs. LSF).

 1  #!/bin/csh
 2  #BSUB -n 32
 3  #BSUB -J samplejob
 4  #BSUB -o %J.out
 5  #BSUB -e %J.err
 6  #BSUB -q workq
 7  #BSUB -W 0:15
 8  #BSUB -u This email address is being protected from spambots. You need JavaScript enabled to view it.
 9  #BSUB -B
10  #BSUB -N
11  echo "Job Starting"
12  echo "Submitted from Directory: $LS_SUBCWD"
13  ./my_job my_args

Submitting Scripts

Once you have completed your job script, you are ready to submit it to the resource manager. This task can be done in PBS with the qsub command:


or, for LSF, with the bsub command:


All of the options used in the job scripts can also be passed directly on the command line to either the qsub or bsub command. This option shouldn't be used as a substitute for creating job scripts, but it can be useful in certain cases. For instance, if you wanted to vary the number of nodes you ran a job on to measure it's performance, and you didn't want to change your script for each run, you could simply remove the directive about nodes from the script, and submit commands to the queue such as:

 $qsub -l nodes=4  
 $qsub -l nodes=8  

You could even place these commands inside another script (and probably should).

Parting Thoughts

With the distribution of a few sample scripts, you can save your users a lot of time and effort. The scripts here provide a starting point, but you should probably provide a sequence of steadily more sophisticated scripts. The next step would be to add directive to define dependencies; for instance, submitting jobs that won't start until other jobs finish. This feature is particularly useful if you have jobs producing files that are input to other jobs. There are plenty more options, but we're out of space for this month. Don't worry about mastering them all, the simple set provided here we'll get you through a lot of jobs. Happy batching!

Finally, an astute reader pointed out that we missed a resource manger in last issue. SLURM is a production resource manager used and developed at Lawrence Livermore National Labs. It is now more widely available under the GNU public license. Like PBS and LSF, it allows for integration with MAUI and other schedulers. One of the strengths of SLURM is it's ability to tolerate node failures and continue functioning. SLURM is in use on cluster of 1,000 nodes already.

Thanks to Karl Schulz at the Texas Advanced Computing Center, for access to scripts from their production LSF environment.

Sidebar One: Resources
Portable batch System (PBS/Torque)

Load Sharing Facility (LSF)

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Dan Stanzione is currently the Director of High Performance Computing for the Ira A. Fulton School of Engineering at Arizona State University. He previously held appointments in the Parallel Architecture Research Lab at Clemson University and at the National Science Foundation.