data storage location

Ashley Pittman ashley at pittman.co.uk
Thu Sep 11 13:03:30 EDT 2003


> The alternative approach is to keep copies of the data on local disk on
> each node. This gives you good IO rates, but you then have a substantial
> data management problem; how to you copy 100Gb to each node in your
> cluster in a sensible amount of time, and how do you update the data and
> make sure it is kept consistent?
> 
> The commonest approach to data distribution is to do some sort of
> cascading rsync/rcp which follows the topology of your network.

I've often wondered why there isn't some kind of a 'mpicp' program to do
just this.

I'd imagine the command line to be something like

$ mpirun -allcps mpicp node0:~/myfile.dat /tmp/

This would then use MPI_Bcast to send the data to all the nodes.  The
assumption here is that MPI_Bcast is fairly efficient anyway so it's
best to use it rather than writing your own cascading rsync algorithm.

Ashley,

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list