[Beowulf] Recommendations for new cluster

Fri Jan 9 16:59:54 EST 2004

We're looking a few months into the future and are beginning to design
our next renderfarm for production at our studio.  Instead of going with
a render system tailored specifically for the renderer we're using, like
Alfred for Pixar's RenderMan http://www.pixar.com/, we're toying with
the idea of a more generalized cluster system which will open up a ton
of possibilities for what we can do with the farm.  However, having only
tinkered with generalized cluster systems during college, I am a bit new
to some of the details and what is right for us.  After exhausting
Google for all I can get out of it, I have some questions over what
exactly is right for us. And I would rather have real world experiences
with tools instead of someone trying to sell me something.

In a nutshell, the questions we have:

-What batch system do you recommend for our purposes (Condor, PBSPro,
ScalablePBS, SGE)?

-What monitoring tools and set up tools have you found and used that
reduce down time on faulty nodes (ganglia, management tools, etc.)?

-Fail over of the head node is also important to us, but I haven't
researched enough on this topic to start asking questions.  (So far,
I've only found stuff for SGE, and a rumor that PBSPro will add this
feature soon)

A bit of detail on our requirements:

First, we're looking at a cluster of around 400 CPUs, which is getting
dangerously close to the limits of OpenPBS.  Considering that this
number could easily grow down the road, pushing that limit as far back
as possible is needed.  So far, PBS looks like the most widely used and
known system out there, so some variant of that seems ideal for finding
prospective employees with cluster experience, although it looks like a 
lot of you deal with SGE as well.  PBSPro and ScalablePBS advertise they 
get the scalability limit much higher than what we need, but the first 
is much more expensive.  I have yet to find any documentation or rumor 
of SGE's upper bound.

The majority of jobs in our queue will be small and single node (but
definitely multithreaded to take advantage of dual cpu nodes).  This
means intercommunication between nodes is almost nil.  However, there
will be cases where we need processes to communicate outside the cluster
with workstations, but that isn't terribly difficult to design around.

One of the problems I have noticed is that some batch managers bog down
with a large amount of jobs.  Looking at the structure of our jobs, we
can easily wind up with 20,000-40,000 jobs in a day with a large amount
of them being submitted to the batch around the same time (when
everybody leaves work).

Any thoughts what might be the best solution for us?

Thanks for any recommendations you can give me, and feel free to ask me
any more details you want.

Rich

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf