data storage location
becker at scyld.com
Sat Sep 13 07:56:38 EDT 2003
On Fri, 12 Sep 2003 hanzl at noel.feld.cvut.cz wrote:
> > The alternative approach is to keep copies of the data on local disk on
> > each node. This gives you good IO rates, but you then have a substantial
> > data management problem; how to you copy 100Gb to each node in your
> > cluster in a sensible amount of time, and how do you update the data and
> > make sure it is kept consistent?
One significant question is the access pattern and requirement.
Is there a single or multiple application access patterns?
Does the application read all or only a subset of the data?
Is the subset predictable? Externally by a scheduler?
Does the application step linearly through the data, or randomly access?
If linear stepping, is the state space of the application small?
If small, as with a best-match search, the processing time per file
byte tends to be small. We recommend a static split of the data
across machines and migrating the process instead.
In the case of a single file read we can often do this without
modifying the application, or localize the changes to a few
lines around the read loop.
If large, e.g. building a tree in memory from the data, what is the
per-byte processing time?
across machines and migrate the process.
How is the data set updated?
A read-only data set allows many file handling options.
If files are updated as a whole, you may use a caching versioned file
system. That is specialized, but provides many opportunities for
Handling arbitrary writes in the middle of files requires a consistent file
system, and the cost for consistency is very high.
> Cache-like behavior would save a lot of manual work but unfortunately
> I am not aware of any working solution for linux,
Several exist. It depends on the semantics you need, as doing this
efficiently requires making assumptions about the access patterns and
> or caching ability of AFS/Coda
> (too cumbersome for cluster) or theoretical features of
> Intermezzo (still and maybe forever unfinished).
...Declare it a success and move on
Donald Becker becker at scyld.com
Scyld Computing Corporation http://www.scyld.com
914 Bay Ridge Road, Suite 220 Scyld Beowulf cluster system
Annapolis MD 21403 410-990-9993
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf