AW: mulitcast copy or snowball copy

Rene Storm rene.storm at
Mon Aug 18 11:34:16 EDT 2003

Hi Donald,

> I want to distribute large files over a cluster.
How large?  Some people think that 1MB is large, while others consider
large files to be 2GB+ (e.g. "Large File Summit").  This will have a
significant impact on how you copy the file.

Rene: Yes, I think 1 mb is large, but I have to copy files upto 2GB
each. (Overall 30 GB)
	And the cluster is 128++ nodes.

Here is an example: The Intel EEPro100 design configures the multicast
filter with a special command appended to the transmit command queue.
The command is followed by a list of the multicast addresses to accept.
While the command is usually queued to avoid delaying the OS, the chip
makes an effort to keep the Rx side synchronous by turning off the
receiver while it's computing the new multicast filter.  So the longer
the multicast filter list and the more frequently it is changed, the
more packets dropped.  And what's the biggest performance killer with
multicast?  Dropped packets..

Rene: Thats right, but what if I ignore dropped packets and accept the
corrupt files ?
	I would be able to rsync them later on.
	First Multicast to create files, Second step is to compare with
	I've tried this and it isn't really slow, if you're doing the
rsync via snowball.

If you are doing this for yourself, the solution is easy.
Try the different approaches and stop when you find one that works for
you. If you are building a system for use by others (as we do), then the
problem becomes more challenging.  

Rene: That's the problem with all the things you do, first they are for
your own and then everybody wants them ;o)

> but there are some heavy problems.
> e.g.
> What happens if one node (in the middle) is down.

Good: first consider the semantics of failure.  That means both recovery
and reporting the failure.

My first suggesting is that *not* implement a program that copies a file
to every available node.  Instead use a system where you first get a
list of available ("up") nodes, and then copy the files to that node
list.  When the copy completes continue to use that node list rather
then letting jobs use newly-generated "up" lists.

Rene: Good idea

You can implement low-overhead fault checking by counting down job
issues and job completion.  As the first machine falls idle, check that
the final machine to assign work is still running.  As the next-to-last
job completes, check that the one machine still working is up.

Rene: But how do I get this status back to my "master", e.g command from
master: node16 copy to node17? 
	I don't want do de-centralize my job, like fire and forget.



Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list