[Beowulf] Torrents for HPC
bill at cse.ucdavis.edu
Wed Jun 13 17:59:16 EDT 2012
On 06/13/2012 06:40 AM, Bernd Schubert wrote:
> What about an easy to setup cluster file system such as FhGFS?
Great suggestion. I'm all for a generally useful parallel file systems
instead of torrent solution with a very narrow use case.
> As one of
> its developers I'm a bit biased of course, but then I'm also familiar
I think this list is exactly the place where a developer should jump in
and suggest/explain their solutions as it related to use in HPC clusters.
> with Lustre, an I think FhGFS is far more easiy to setup. We also do not
> have the problem to run clients and servers on the same node and so of
> our customers make heavy use of that and use their compute nodes as
> storage servers. That should a provide the same or better throughput as
> your torrent system.
I found the wiki, the "view flyer", FAQ, and related.
I had a few questions, I found this link
http://www.fhgfs.com/wiki/wikka.php?wakka=FAQ#ha_support but was not
sure of the details.
What happens when a metadata server dies?
What happens when a storage server dies?
If either above is data loss/failure/unreadable files is there a
description of how to improve against this with drbd+heartbeat or
Sounds like source is not available, and only binaries for CentOS?
Looks like it does need a kernel module, does that mean only old 2.6.X
CentOS kernels are supported?
Does it work with mainline ofed on qlogic and mellanox hardware?
From a sysadmin point of view I'm also interested in:
* Do blocks auto balance across storage nodes?
* Is managing disk space, inodes (or equiv) and related capacity
planning complex? Or does df report useful/obvious numbers?
* Can storage nodes be added/removed easily by migrating on/off of
* Is FhGFS handle 100% of the distributed file system responsibilities
or does it layer on top of xfs/ext4 or related? (like ceph)
* With large files does performance scale reasonably with storage
* With small files does performance scale reasonably with metadata
BTW, if anyone is current on any other parallel file system I'd (and I
suspect others on list) would find it very valuable. I run a hadoop
cluster, but I suspect there are others on list that could provide
better answer than I.
My lustre knowledge is second hand and dated.
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf