[Beowulf] Hadoop

Greg Lindahl lindahl at pbm.com
Mon Dec 29 17:09:34 EST 2008


On Fri, Dec 26, 2008 at 05:16:04PM -0600, Gerry Creager wrote:

> We've a user who has requested its installation on one of our clusters,  
> a high-throughput system.

You didn't say anything about what they wanted to do. Hadoop is
designed to store a lot of data, and then enable what we HPC people
would call nearly-embarrassingly-parallel computation with good
locality -- it takes shards of mapreduce computation to run on the
same system as the disk shards being processed.

This means you'll have to dedicate systems over the long term to store
the data (much like PVFS), and all of these systems will have to be a
part of their mapreduce jobs. So if your queue system can run
whole-cluster jobs easily, no problem.

If, instead, they're just looking for a simple way to do
embarrassingly parallel computations, without lots of persistent data,
then you can probably point them at something easier and more friendly
to your queue system.

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Beowulf mailing list