[Beowulf] Torrents for HPC

Jesse Becker beckerjes at mail.nih.gov
Mon Jun 11 14:02:43 EDT 2012

On Mon, Jun 11, 2012 at 01:49:23PM -0400, Joshua Baker-LePain wrote:
>On Fri, 8 Jun 2012 at 5:06pm, Bill Broadley wrote
>> Do you think it's worth bundling up for others to use?
>> This is how it works:
>> 1) User runs publish <directory> <name> before they start submitting
>>    jobs.
>> 2) The publish command makes a torrent of that directory and starts
>>    seeding that torrent.
>> 3) The user submits an arbitrary number of jobs that needs that
>>    directory.  Inside the job they "$ subscribe <name>"
>> 4) The subscribe command launches one torrent client per node (not per j
>>    job) and blocks until the directory is completely downloaded
>> 5) /scratch/<user>/<name> has the users data
>> Not nearly as convenient as having a fast parallel filesystem, but seems
>> potentially useful for those who have large read only datasets, GigE and
>> NFS.
>> Thoughts?
>I would definitely be interested in a tool like this.  Our situation is
>about as you describe -- we don't have the budget or workload to justify
>any interconnect higher-end than GigE, but have folks who pound our
>central storage to get at DBs stored there.

I looked into doing something like this on 50-node cluster to
synchronize several hundred GB of semi-static data used in /scratch.
I found that the time to build the torrent files--calculating checksums
and such--was *far* more time consuming than the actual file
distribution.  This is on top of the rather severe IO hit on the "seed"
box as well.  

I fought with it for a while, but came to the conclusion that *for
_this_ data*, and how quickly it changed, torrents weren't the way to
go--largely because of the cost of creating the torrent in the first

However, I do think that similar systems could be very useful, if
perhaps a bit less strict in their tests.  The peer-to-peer model is
uselful, and (in some cases) simple size/date check could be enough to
determine when (re)copying a file.

One thing torrent's don't handle are file deletions, which opens up a
few new problems.

Eventually, I moved to a distrbuted rsync tree, which worked for a
while, but was slightly fragile.  Eventually, we dropped the whole
thing when we purchased a sufficiently fast storage system.

Jesse Becker
NHGRI Linux support (Digicon Contractor)
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Mailscanner: Clean

More information about the Beowulf mailing list