[Beowulf] dedupe filesystem

Mark Hahn hahn at mcmaster.ca
Fri Jun 5 09:52:55 EDT 2009

>> have tiered storage today, but in the future i can see a need to have
>> a storage pool with SATA and a storage pool with SAS or faster drives
>> in it.

IMO, this is a dubious assertion.  I bought a couple incredibly cheap
desktop disks for home use a couple weeks ago: just seagate 7200.12's.
these are of the latest 500G/platter generation, so have the high density
and thus bandwidth:

sure, your application may require low-latency.  but bandwidth is easy.

>> Some of the researchers where I am, work on data for months.

my organization's current policy is to be fairly stingy with /home and /work,
neither of which have any timeouts.  /scratch currently has a 1-month timeout,
which unfortunately tends to be too short to encourage use.

>> Is this something better solved with pre/post-amble copies or through
>> policies?

we currently have a periodic crawler that collects data on each filesystem:
hashing each file to avoid people gaming timeouts with touch.

> The best of both worlds would certainly be a central, fast storage filesystem,
> coupled with a hierarchical storage management system.

I'm not sure - is there some clear indication that one level of storage is 
not good enough?

> Oh wait, it might exist already... Well, at least it's in the works: Sun and
> CEA are working on implementing such an HSM for Lustre 2.0. See
> http://wiki.lustre.org/images/8/8b/AurelienDegremont.pdf for details.

this seems like a bad design to me.  I would think (and I'm reasonably
familiar with Lustre, though not an internals expert) that if you're going to 
touch Lustre interfaces at all, you should simply add cheaper, higher-density
OSTs, and make more intelligent placement/migration heuristics.  I guess that 
CEA already has a vast investment in some existing HSM, so can't do this.

regards, mark hahn
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list