FhGFS: A Fast and Scalable Parallel Filesystem

Article Index

FhGFS and Exascale

By providing leading edge technology and solutions for the HPC community, Fraunhofer is also important part of the exascale discussion occurring at conferences and the HPC community in general. Developers of the CC-HPC form a creative think tank to address this topic, identify the challenges and, most important, find smart and new solutions, to deal with the challenges and pave the way to exascale computing. Having a vast expertise in HPC tools and applications at the same time, Fraunhofer has experience that can be used to attack the exascale problem from several directions, the parallel file system being one of them.

Looking at the first supercomputer to achieve 10 PFlops, the Japanese K Computer, and its dimensions (864 cabinets, 88,000 nodes) as well as energy consumption (12.6 MWatt), it's rather safe to say that an exascale machine would probably not simply consist of 100 such machines. At the same time, it is certain, that an exascale system will consume more power and consist of more compute cores. Hence, power consumption, fault tolerance as well as (software) scalability are the challenges to be solved, in order to achieve a usable exascale system. A parallel file system can contribute to that in various ways.

One example on how to address the power consumption on the file system level is to leverage the natural levels of storage: current jobs, short-term working set and long-term data archive. By introducing support for hierarchical storage management (HSM) and using energy efficient technologies, such as tapes, for long-term storage, a significant reduction in energy consumption can be achieved. For this purpose, Fraunhofer ITWM has teamed up with Germany based Grau Data. FhGFS will implement HSM support into its metadata server that will then directly interact with the Grau Archive Manager to provide a scalable HSM solution.

On the scalability side, static striping patterns, as they are common today, are one of the bottlenecks for throughput scaling. Balancing the number of storage targets is almost impossible, as in most cases it is not known, how large files in the file system will be. Optimizing for small files, i.e. using few storage targets, slows down performance for large files, while optimizing for larger files using many storage targets increases the overhead and potentially slows down small file performance. Technically, a user could influence these patterns, but a regular file system user shouldn't have to deal with the number of storage targets. As a solution, FhGFS is going to provide automated irregular striping, allocating more targets as the file grows. The beauty of this solution is that additional targets are only used, when the performance gain outnumbers the additional overhead. This ensures fast access to any file, regardless of its size.

Assuming an exascale system will have by far more components than today's systems, failures of parts will become more frequent. Fault tolerance, especially keeping data available when something in the system breaks, becomes a more and more important requirement for parallel file systems. Using redundant arrays of storage doesn't quite solve the problem. Besides being expensive and complex to configure/manage, eventually it just lowers the possibility of data loss by hoping the redundant array doesn't break. Keeping the data redundant within the file system is a better solution and the design path chosen by the FhGFS team. This method keeps complexity low and only adds cost for inexpensive additional disk capacity. Current versions already come with High Availability (HA) support and allow mirroring metadata and/or file contents on a per-file/per-directory base as well as individual mirrors for each file/directory.

Finally, what happens if and I/O bound job still doesn't run as fast as expected? In order to address this issue, FhGFS provides good monitoring and analysis tools that provide live statistics, profiling and much more. FhGFS already comes with a graphical monitoring and administration solution -- the AdMon. The challenge to all such tools is in how to visualize all this information in a way that is still intuitive for users and administrators.

Where to get FhGFS?

FhGFS is provided free of charge and the packages of the current stable release (2012.10-r4) can be downloaded directly from the project website at www.fhgfs.com or from repositories for the different Linux distributions. Commercial support is available directly from Fraunhofer or from its international partners. New FhGFS updates will also be announced on the FhGFS Twitter page.


Tobias Goetz holds an M.S. in Computer Science from University of Tübingen, Germany and has been a researcher and project manager at the Competence Center High-Performance Computing at the Fraunhofer Institute for Industrial Mathematics in Kaiserslautern, Germany since 12/2008.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.