[Beowulf] Re: dedupe filesystem

Dave Love d.love at liverpool.ac.uk
Mon Jun 29 08:30:31 EDT 2009


Ashley Pittman <ashley at pittman.co.uk> writes:

> If you relied on the md5 sum alone there would be collisions and those
> collisions would result in you losing data.

The question is whether the probability of collisions is high compared
with other causes -- presumably hardware, assuming no-one puts figures
on the software reliability.  As far as I remember, the calculation for
SHA-1 for Plan 9's Venti¹, which no-one seems to have mentioned, says
ignore collisions for petabyte filesystems.

Ob-Beowulf:  You can run Venti on GNU/Linux,² but I don't know how the
current implementation performs.  Also, GlusterFS has a `data
de-duplication translator' on its roadmap, which I didn't see mentioned.

--
1. http://plan9.bell-labs.com/sys/doc/venti/venti.html
2. http://swtch.com/plan9port/

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list