[Beowulf] Recommendations for new cluster

Andrew Wang andrewxwang at yahoo.com.tw
Sat Jan 10 00:20:10 EST 2004

--- Rich Pickler <rpickler at dnahelix.com> 的訊息:
> Instead of going with a render system tailored 
> specifically for the renderer we're using, like
> Alfred for Pixar's RenderMan http://www.pixar.com/,

You are not alone, Framestore CFC is doing the same
thing. They are looking at using SGE 6.0 to replace

You can follow the mail thread -- "SGE Renderfarm":

(Since you are in this graphics industry, can you tell
me if Framestore is big? People on the sge list looked
so excited.)

> And I would rather have real world experiences
> with tools instead of someone trying to sell me
> something.

Use Opensource (r), and no one will ask you for money.

> -What batch system do you recommend for our purposes
> (Condor, PBSPro, ScalablePBS, SGE)?

SGE has array jobs, but all others do not.

An array job is submitted as a single job, but each
element of the array takes different inputs. It is
much faster to submit a single array job than to
submit each element as seperately. And you can stop
and del the whole array job or just each individual

AFAIK, this job array feature makes graphics rendering
and certain types of bio-computation job management
easily -- e.g. different frames are part of the same
job array.
> -What monitoring tools and set up tools have you
> found and used that reduce down time on faulty nodes

> (ganglia, management tools, etc.)?

ganglia is good, a lot of people use it. I just think
setting up a webpage is the overkilled part -- need to
run Apache to monitor the cluster -- but just MHO

If you like command line stuff better, you can use
qhost in SGE to see which host is down.

> I've only found stuff for SGE, and a rumor that
> PBSPro will add this feature soon)

Yes, PBSPro 5.4 will support it.

> So far, PBS looks like the most widely used and
> known system out there, so some variant of that
> seems ideal for finding prospective employees with 
> cluster experience,  

I find PBS is used more in HPC clusters (MPI, PVM
jobs), and SGE is used more in enterprise and
biocomputation clusters.

The problem with SGE when handling MPI jobs is that it
takes time to set up the PE, and PBS has an external
interface which is much nicer. People had already
written mpiexec to take advantage of the interface. 

> I have yet to find any documentation or rumor 
> of SGE's upper bound.

SGE scales quite well, I haven't seen people
complaining about it, may be that's why you couldn't
find it.

Just to let you know, Forecast Systems Laboratory has
a cluster using SGE, it has 1536-CPUs. They gave a
presentation of their cluster last year.

SGE 6.0 (coming this year) will scale even further
(like multithread SGE, and use a DBMS to handle job
spooling), you can take a lot at Sun's presentations


> One of the problems I have noticed is that some
> batch managers bog down with a large amount of 
> jobs.

Make sure you have lots of memory in the head node,
and no swapping.

I used to try how SGE behaves when there are tons of
submitted jobs, I went up to 100,000 and it was still
running, but of course it was just a toy cluster with
several hosts. However, I think 20,000-40,000 can be
handled very easily.

> Any thoughts what might be the best solution for us?

I think the best is to try it yourself, as it all
depends on what you want to do.

Download the SGE binaries (or source, if you really
wanted to), and install it on one of the test
machines. It doesn't require root access, and it can
be placed any directory. It's free, all you need is

Also, I highly recommend the mailing lists, a lot of
people are on the lists, and can solve problems


Like the "SGE Renderfarm" thread, they exchange
experience and share knowledge very often.


每天都 Yahoo!奇摩
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list