data storage location

Joel Jaeggli joelja at darkwing.uoregon.edu
Fri Sep 12 13:20:34 EDT 2003


On Fri, 12 Sep 2003 hanzl at noel.feld.cvut.cz wrote:

> > The alternative approach is to keep copies of the data on local disk on
> > each node. This gives you good IO rates, but you then have a substantial
> > data management problem; how to you copy 100Gb to each node in your
> > cluster in a sensible amount of time, and how do you update the data and
> > make sure it is kept consistent?
> > 
> > ...
> > 
> > If your dataset is larger than the amount of local disk on your nodes, you
> > then have to partition your data up, and integrate that with your queuing
> > system, so that jobs which need a certain bit of the data end up on a node
> > which actually holds a copy.
> 
> This is exactly what we do. But moving the right data at right place
> and doing good job scheduling at the same time is not easy. Ideally we
> would like to automate it via huge caches on local disks:
> 
>  - central server has some 400GB of read-only data
>  - nodes cache them on their harddisks as needed
>  - queing system preferes some regular patterns in job/node assignment
>    to make cache hits likely
> 
> Cache-like behavior would save a lot of manual work but unfortunately
> I am not aware of any working solution for linux, I want something
> like cachefs (nonexistent for linux) or caching ability of AFS/Coda
> (too cumbersome for cluster) or theoretical features of
> Intermezzo (still and maybe forever unfinished).

it's not clear to me that coda is too cumbersome for clusters... but the 
other approach to scale the storage infrastructure to the point where it's 
performance approaches that of locally attached disk. there's a lot of 
work that's been done on san and iscsi architecture for storage clusters 
that seems relevant to the issues of compute clusters as well...
 
> At the moment we work on small kernel hack to solve this,
> unfortunately repeating for the n-th time what many others once did
> and never maintained.
> 
> Maybe genome research will generate more need for this data access
> pattern and more chance for re-usable software solution?
> 
> Regards
> 
> Vaclav Hanzl
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
-------------------------------------------------------------------------- 
Joel Jaeggli  	       Unix Consulting 	       joelja at darkwing.uoregon.edu    
GPG Key Fingerprint:     5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list