Home
Learning About Clusters
Programming Clusters
Administering Clusters
Benchmarking Clusters
File Systems for Clusters
Cluster Applications/Grid
Cluster News
Site Map
 
    Home
Search
Monkey Support
Main Menu
Home
News
Features
Columns
Reviews
Links
FAQ's
Contact
Site Information
Cluster Classifieds
Projects
Conference Reports
Cluster Agenda
Site Map
Add This Article

Cluster Agenda

Cluster Builder

Appro International


LSF/PBS Scripting Nuts and Bolts Print E-mail
Written by Dan Stanzione   
Tuesday, 13 June 2006
Article Index
LSF/PBS Scripting Nuts and Bolts
Page 2
You probably will also want to place in your template script options for notifying the user by e-mail when something happens with your job, such as start of execution, abort, or completion. You can configure the list of e-mail addresses used, and the events that result in a mail message being generated. The sample scripts mail a single user at the beginning and end of job execution. For PBS, one directive sets up the e-mail address list (line 9), and a second directive sets all notification options (line 11). The be stands for beginning and end; you can use b,e, or a (abort), or any combination of the three. LSF uses separate directives for each type of notify (see lines 8,9, and 10); -u sets the user e-mail list, -B sends mail at the beginning of the job, and -N notifies you upon completion. The completion report will also contain some statistics about the job, such as how long it actually executed.

Finally, once you've finished all your directives, your script needs to actually run your job. You can simply place the commands in the script that would use to normally run the job from the command line. You can also run any other sequence of shell commands and the output will be captured in your standard output file. Using the echo command to place some labels in your file is typically helpful here. In this case, in line 14 of Listing One and line 12 of Listing Two, the directory the command was run from will be placed in the file (note the different environment variables for PBS vs. LSF).

 1  #!/bin/csh
 2  #BSUB -n 32
 3  #BSUB -J samplejob
 4  #BSUB -o %J.out
 5  #BSUB -e %J.err
 6  #BSUB -q workq
 7  #BSUB -W 0:15
 8  #BSUB -u dstanzi@clemson.edu
 9  #BSUB -B
10  #BSUB -N
11  echo "Job Starting"
12  echo "Submitted from Directory: $LS_SUBCWD"
13  ./my_job my_args

Submitting Scripts

Once you have completed your job script, you are ready to submit it to the resource manager. This task can be done in PBS with the qsub command:

$qsub pbs_sample_script.sh

or, for LSF, with the bsub command:

$bsub lsf_sample_script.sh

All of the options used in the job scripts can also be passed directly on the command line to either the qsub or bsub command. This option shouldn't be used as a substitute for creating job scripts, but it can be useful in certain cases. For instance, if you wanted to vary the number of nodes you ran a job on to measure it's performance, and you didn't want to change your script for each run, you could simply remove the directive about nodes from the script, and submit commands to the queue such as:

 $qsub -l nodes=4 pbs_sample_script.sh  
 $qsub -l nodes=8 pbs_sample_script.sh  

You could even place these commands inside another script (and probably should).

Parting Thoughts

With the distribution of a few sample scripts, you can save your users a lot of time and effort. The scripts here provide a starting point, but you should probably provide a sequence of steadily more sophisticated scripts. The next step would be to add directive to define dependencies; for instance, submitting jobs that won't start until other jobs finish. This feature is particularly useful if you have jobs producing files that are input to other jobs. There are plenty more options, but we're out of space for this month. Don't worry about mastering them all, the simple set provided here we'll get you through a lot of jobs. Happy batching!

Finally, an astute reader pointed out that we missed a resource manger in last issue. SLURM is a production resource manager used and developed at Lawrence Livermore National Labs. It is now more widely available under the GNU public license. Like PBS and LSF, it allows for integration with MAUI and other schedulers. One of the strengths of SLURM is it's ability to tolerate node failures and continue functioning. SLURM is in use on cluster of 1,000 nodes already.

Thanks to Karl Schulz at the Texas Advanced Computing Center, for access to scripts from their production LSF environment.

Sidebar One: Resources
Portable batch System (PBS/Torque)

Load Sharing Facility (LSF)

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Dan Stanzione is currently the Director of High Performance Computing for the Ira A. Fulton School of Engineering at Arizona State University. He previously held appointments in the Parallel Architecture Research Lab at Clemson University and at the National Science Foundation.

Comment on this article
You must login to leave comments...


Other Visitors Comments
There are no comments currently....

Last Updated ( Tuesday, 26 September 2006 )
 
< Prev Article   Next Article >
Linux HPC
 

Creative Commons License
  ©2005-2008 Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.