[Beowulf] scheduler policy design

Bill Bryce bill at platform.com
Wed Apr 25 09:00:11 EDT 2007

No users typically do not know the runtime profile well, so it is up to
the administrators of the batch system to configure LSF so that it can
recognize these types of jobs.  This can be done several ways (and I
expect you can do something similar in SGE as well).  Some admins go the
simple route and create a queue that accepts only jobs of a certain
'type' - the runtime profile is set on the queue - so that users don't
have to remember the runtime profile - just submit the job.  Admittedly
this is a very simplistic way of doing things, so we are looking at
better ways...one that we have started implementing (in the latest
version of LSF) is 'application encapsulation' - essentially, everything
about the job 'profile' - mem, swap, type of hosts it is looking for,
licenses, runtime profile, is stored in a XML file that the scheduler
uses to match the job to available resources.  Now this still doesn't
solve the problem of figuring out the runtime profile of the job, for
that you have to run the job 'several times' and record the actual
resource usage.  LSF keeps track of runtime usage for mem, swap, virtual
mem, number of processes/threads, and that information can be used to
determine the profile of the job - so now all the user has to do is
submit their job and include in the submission what type of job is being
submitted, then LSF knows how to run it.  It would be cool if there was
a tool (it doesn't have to be part of any particular batch system) that
could take data from hundreds or even thousands of job runs for a
particular type of job - and figure out what is the 'best profile' for
the job.  I would expect that this tool would not work in all cases but
it would be an interesting project.



-----Original Message-----
From: Toon Knapen [mailto:toon.knapen at fft.be] 
Sent: Wednesday, April 25, 2007 3:43 AM
To: Bill Bryce
Cc: Tim Cutts; beowulf at beowulf.org
Subject: Re: [Beowulf] scheduler policy design

Bill Bryce wrote:

> 2) use the LSF resource reservation mechanism.  This is more complex
> essentially you can boil it down to the idea that you tell LSF to
> up the resource usage' on a resource, making it look like more I/O is
> consumed than really is consumed for a given period of time and apply
> decay function so that the 'artificial bump in I/O' decreases over
> time.....  

Interesting. However this approach requires that the IO profile of the 
application is known. Additionally it requires the users of the 
application (which are generally not IT guys) to know and understand 
this info and pass it on to the scheduler when they launch their app.
In your experience, do you manage to convince real-life users to provide

this info?


Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


More information about the Beowulf mailing list