mulitcast copy or snowball copy

Donald Becker becker at scyld.com
Mon Aug 18 13:31:17 EDT 2003


On Mon, 18 Aug 2003, Erik Arneson wrote:
> On Mon, Aug 18, 2003 at 11:26:17AM +0200, Rene Storm wrote:
> > Hi Beowulfers,
> >  
> > Problem:
> > I want to distribute large files over a cluster.
> > To raise performance I decided to copy the file to the local HD of any node in the cluster.
> >  
> > Did someone find a multicast solution for that or maybe something with snowball principle?
> 
> I am really new to the Beowulf thing, so I am not sure if this solution is a
> good one or not.  But have you taken a look at the various network
> filesystems?  OpenAFS has a configurable client-side cache, and if the files
> are needed only for reading this ends up being a very quick and easy way to
> distribute changes throughout a number of nodes.

This is a good example of why Grid/wide-area tools should not be
confused with local cluster approaches.  The time scale, performance and
complexity issues are much different.

AFS uses TCP/IP to transfer whole files from a server.  With multiple
servers the configuration is static or slow changing.

> (However, I have noticed that network filesystems are not often mentioned in
> conjunction with Beowulf clusters, and I would really love to learn why.
> Performance?  Latency?  Complexity?)

It's because file systems are critically important to many applications.

There is no universal cluster file system, and thus no single solution.
The best approach is not tie the cluster management, membership, or
process control to the file system in any way.  Instead the file system
should be selection based on the application's need for consistency,
performance and reliability.  For instance, NFS is great for small,
read-only input files.  But using NFS for large files, or when any files
will be written or updated, results in both performance and consistency
problems.

When working from a large read-only database, explicitly pre-staging
(copying) the database to the compute nodes is usually better than
relying on an underlying FS.  It's easier, more predictable and more
explicit than per-directory tuning FS cache parameters.

As as example of why predictability is very important, imagine what
happens to an adaptive algorithm when a cached parameter file expires,
or a daemon does a bunch of work.  That machine suddenly is slower, and
that part of the problem now looks "harder".  So the work is reshuffled,
only to be shuffled back during the next time step.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list