data storage location
becker at scyld.com
Mon Sep 15 14:12:03 EDT 2003
On Mon, 15 Sep 2003 hanzl at noel.feld.cvut.cz wrote:
> > A read-only data set allows many file handling options.
> > If files are updated as a whole, you may use a caching versioned file
> > system. That is specialized, but provides many opportunities for
> > optimization
> > ...
> > > Cache-like behavior would save a lot of manual work but unfortunately
> > > I am not aware of any working solution for linux,
> > Several exist. It depends on the semantics you need, as doing this
> > efficiently requires making assumptions about the access patterns and
> > desired semantics.
> Please what do you include in these 'several'? Am I right that none of
> them is opensource AND freely downloadable ?
You've instantly moved from a technical agenda to a political one.
"Open Source" is a technical attribute, while "free" and "no cost online
access" is different.
The approaches I am talking about are a mix of
- the Transparent File System, originally intended for caching in
front of mounted CDs, but improved to sit in front of NFS. It was
first implemented in Linux around 1993, but I haven't seen any
- several academic projects, two of which I've seen at recent
LinuxWorlds. There was an interesting related paper at
ClusterWorld, which focused on co-servers -- using peers instead of
the main file server.
- and, of course, the implementation that we have.
> I try hard to find out what does this list in your head include,
> please correct my thoughts:
The challenge here is that the kind of file server I'm talking about
does not cleanly fit with Unix end-user semantics. Thus they are often
hidden or not considered a file system. Specifically, the following
conflict with Unix expectations
Using versions while allowing multiple open versions with the same name.
Prohibiting anything but whole-file updates
Not providing consistent readdir() and access() results, as there is
no fixed list of files.
> - you certainly did not qualify InterMezzo and Lustre as working
That's a political minefield.
> - old dead projects like ClusterNFS or amd hacks are also out...
Those are new rules. Caching file systems are very ad hoc, and thus
used for only one purpose. There is not the broad user base motivation to
make them general or keep them current. That doesn't mean they don't exist.
And yes, some of the systems I'm referring to are related to automount
daemons (AMD). I'm especially familar with those, as many are
implemented with NFS servers. I wrote one of the first user-level NFS
servers in the late '80s, and that NFS code became the basis for
implementing several quirky, ad hoc filesystems. (For those that don't
know how AMDs work, they mount a server or create a file in a local
directory and return a symbolic link pointing to it.)
One specifically was a FTP filesystem. When a file block was requested,
the whole file was moved locally. It had the advantage over most
caching systems of also being able to support readdir().
Why use the NFS / automount approach? Because
it's the only user-level interface to the filesystem
NFS is already quirky
> Basically I just want confirmation for this one bit of information:
> "There is no reliable opensource freely downloadable filesystem (or
> addition to filesystem) for (contemporary) linux which would use local
> harddisk as a cache, even for the most relaxed semantics one can
> imagine (whole file caching, read-only, nearly no synchronisation)."
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system
Annapolis MD 21403 410-990-9993
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf