[Beowulf] New member, upgrading our existing Beowulf cluster

Bill Broadley bill at cse.ucdavis.edu
Thu Dec 3 21:19:08 EST 2009


Greg Lindahl wrote:
> On Fri, Dec 04, 2009 at 12:57:07PM +1100, Chris Samuel wrote:
> 
>> If you've got a job running on there for a month
>> or two then there's a fairly high opportunity cost
>> involved.
> 
> That kind of policy has a fairly high opportunity cost, even before
> you factor in linked nodes. E.g. you see a system disk going bad, but
> the user will lose all their output unless the job runs for 4 more
> weeks...

Indeed.  You'd hope that such long running jobs would checkpoint.  Seems like
the perfect place for virtualization.  Seems like for mostly CPU bound jobs
the overhead is getting pretty low.  Then you get all kinds of benefits:
* Checkpointing
* Migration
* easy backfill

Seems like it would be real popular with the admins.  Anyone doing this?
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list