data storage location

Donald Becker becker at
Mon Sep 15 14:12:03 EDT 2003

On Mon, 15 Sep 2003 hanzl at wrote:

> >   A read-only data set allows many file handling options.
> >   If files are updated as a whole, you may use a caching versioned file
> >     system.  That is specialized, but provides many opportunities for
> >     optimization 
> > ...
> >
> > > Cache-like behavior would save a lot of manual work but unfortunately
> > > I am not aware of any working solution for linux,
> > 
> > Several exist.  It depends on the semantics you need, as doing this
> > efficiently requires making assumptions about the access patterns and
> > desired semantics.
> Please what do you include in these 'several'? Am I right that none of
> them is opensource AND freely downloadable ?

You've instantly moved from a technical agenda to a political one.
"Open Source" is a technical attribute, while "free" and "no cost online
access" is different.

The approaches I am talking about are a mix of
  - the Transparent File System, originally intended for caching in
    front of mounted CDs, but improved to sit in front of NFS.  It was
    first implemented in Linux around 1993, but I haven't seen any
  - several academic projects, two of which I've seen at recent
    LinuxWorlds.  There was an interesting related paper at
    ClusterWorld, which focused on co-servers -- using peers instead of
    the main file server.
  - and, of course, the implementation that we have.

> I try hard to find out what does this list in your head include,
> please correct my thoughts:

The challenge here is that the kind of file server I'm talking about
does not cleanly fit with Unix end-user semantics.  Thus they are often
hidden or not considered a file system.  Specifically, the following
conflict with Unix expectations
  Using versions while allowing multiple open versions with the same name.
  Prohibiting anything but whole-file updates
  Not providing consistent readdir() and access() results, as there is
    no fixed list of files.

> - you certainly did not qualify InterMezzo and Lustre as working

That's a political minefield.

> - old dead projects like ClusterNFS or amd hacks are also out...

Those are new rules.  Caching file systems are very ad hoc, and thus
used for only one purpose.  There is not the broad user base motivation to
make them general or keep them current.  That doesn't mean they don't exist.

And yes, some of the systems I'm referring to are related to automount
daemons (AMD).  I'm especially familar with those, as many are
implemented with NFS servers.  I wrote one of the first user-level NFS
servers in the late '80s, and that NFS code became the basis for
implementing several quirky, ad hoc filesystems.  (For those that don't
know how AMDs work, they mount a server or create a file in a local
directory and return a symbolic link pointing to it.)

One specifically was a FTP filesystem.  When a file block was requested,
the whole file was moved locally.  It had the advantage over most
caching systems of also being able to support readdir().

Why use the NFS / automount approach?  Because
   it's the only user-level interface to the filesystem
   NFS is already quirky

> Basically I just want confirmation for this one bit of information:
>   "There is no reliable opensource freely downloadable filesystem (or
>   addition to filesystem) for (contemporary) linux which would use local
>   harddisk as a cache, even for the most relaxed semantics one can
>   imagine (whole file caching, read-only, nearly no synchronisation)."

Donald Becker				becker at
Scyld Computing Corporation
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list