[Beowulf] dedupe filesystem

Ashley Pittman ashley at pittman.co.uk
Sun Jun 28 13:19:50 EDT 2009


On Thu, 2009-06-25 at 13:09 -0500, Rahul Nabar wrote:

> On Tue, Jun 2, 2009 at 12:39 PM, Ashley Pittman <ashley at pittman.co.uk>
> wrote:
>         Fdupes scans the filesystem looking for files where the size
>         matches, if
>         it does it md5's them checking for matches and if that matches
>         it
>         finally does a byte-by-byte compare to be 100% sure.
> 
> Why is a full byte-by-byte comparison needed even after a md5 sum
> matches? I know there is a vulnerability in md5 but that's more of a
> security thing and by random chance super unlikely , right? 

> Just curious....

Checksums are a (inherently imperfect) way of checking that two files
aren't different, they are not intended to and cannot prove that two
files are the same.

If you relied on the md5 sum alone there would be collisions and those
collisions would result in you losing data.

Ashley Pittman,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list