FhGFS: A Fast and Scalable Parallel Filesystem

Article Index

Crafted in Germany, FhGFS is ready to take on the worlds biggest IO challenges

The Fraunhofer Parallel File System (FhGFS) is the high-performance parallel file system of the Fraunhofer Institute for Industrial Mathematics in Kaiserslautern, Germany. It includes a distributed metadata architecture that has been designed to provide the scalability and flexibility required to run today's most demanding HPC applications while being easy to use and manage.

"There must be a better way to do this!" is the simple motive that became the driving force for the development of a new parallel file system by researchers at the Fraunhofer Institute for Industrial Mathematics (ITWM) in Kaiserslautern Germany. After a fruitless search for an easy to use, low cost, highly scalable alternative to the File System on the institute's supercomputer, a team of people working with Dr. Franz-Josef Pfreundt, head of ITWM's Competence Center High-Performance Computing (CC-HPC), decided in 2004 to fill this gap and develop their own parallel file system. That's when the Fraunhofer Parallel File System (FhGFS) was born.

About Fraunhofer
Fraunhofer (FhG) is one of Europe's largest research companies. Its mission is to undertake applied research of direct utility to private and public enterprise and of wide benefit to society. Fraunhofer maintains more than 80 research institutions worldwide -- among them 60 institutes in Germany -- and employs over 20,000 people, the majority with masters and doctorate degrees in natural sciences. More than 80 percent of the annual research budget of 2 billion Euro (~$2.6 billion, 2012) stem from contract research, the rest from public funding.

The ITWM is one of Fraunhofer's institutes and the first to focus research on industrial mathematics, working on fields such as optimization, fluid dynamics and simulations as well as HPC.

The CC-HPC at the ITWM is active in several fields, developing HPC tools such as FhGFS or GPI but also proprietary HPC applications for customers. The department also has a strong focus on CPU based visualization techniques and Green-by-IT technologies.

Starting as a high-performance parallel file system, dedicated to the HPC community, it is used today in HPC centers of universities, research centers and industry worldwide; among them TOP 500 clusters like the one at the Goethe University in Frankfurt, Germany.

Taking advantage of a "clean sheet of paper design", the developer team, lead by Sven Breuner, was able to set the requirements and key features of the FhGFS without any constraints. The goal was a system with a scalable multi-threaded architecture that distributes metadata and doesn't require any kernel patches, supports several network interconnects including native InfiniBand and is easy to install and manage. All these considerations lead to three cornerstones for FhGFS development:

  • Maximum Scalability
  • Maximum Flexibility
  • Easy to Use

Key Concepts

FhGFS runs on any Linux machine and consists of several components that include services for clients, metadata servers and storage servers. In addition, there is a service for the management host as well as one for a graphical administration and monitoring system. A diagram depicting the architecture is given in Figure One.

To run FhGFS, at least one instance of the metadata server and the storage server is required. But FhGFS allows multiple instances of each service to distribute the load from a large number of clients. To guarantee maximum scalability for the file system, each individual component was designed to scale. Consequently, the system itself scales with the number of clients, metadata servers and storage servers, regardless of their combination.


Figure One: FhGFS component design

Many thoughtful implementation ideas contribute to the ability of FhGFS to scale. Naturally, file contents are distributed over several storage servers using striping, i.e. each file is split into chunks of a given size and these chunks are distributed over the existing storage servers. The size of these chunks can be defined by the file system administrator. In addition, also the metadata is distributed over several metadata servers on a directory level, with each server storing a part of the complete file system tree. This approach allows much faster access on the data. Other factors include direct and parallel access to files on the storage servers by the clients as well as support for high-speed network interconnects such as native InfiniBand.

Flexibility can take various forms with FhGFS. Additional clients as well as metadata or storage servers can easily be added into an existing system without any downtime. In addition, the servers run on top of an existing local file system. While there are no restrictions to the type of underlying file system, recommendations are to use ext4 for the metadata servers and XFS for the storage servers. In terms of hardware there is no strict requirement for dedicated hardware for individual services. This design allows a file system administrator to start the services in any combination on a given set machines and expand in the future. A pretty common way to take advantage of this is combining metadata servers and storage servers on the same machines as shown in Figure Two.


Figure Two: FhGFS with combined metadata and storage servers on the same machine

One of the newest features that strengthen flexibility is support for an on-demand parallel file system instance. A single command line creates an instance of FhGFS over a set of nodes. This feature offers a variety of new use cases, e.g. the possibility to set up a dedicated file system for an individual cluster job or for cloud computing. It also speeds up file system tests, because it is a fast and easy way to setup such a testing system.

On top of this, support for various network-interconnects with dynamic failover as well as many different Linux distributions and kernels allow flexible use in almost every environment. All these options together enable a file system administrator to fine tune his very own installation of FhGFS in a variety of ways. FhGFS comes with a rich set of utilities and the developers have put together insider tips on how to tune the file system to the given hardware setting. These tips can be found together with installation instructions and further information in the publicly available FhGFS wiki.

FhGFS server processes run in userspace and the client itself is a lightweight kernel module that doesn't require any kernel patches. FhGFS runs on any Linux distribution and does not impose hardware requirements on the user. (Of course, faster hardware helps increase system thoughput.)

On strength of FhGFS is ease of use. The file system has a very simple setup and startup mechanism. For users that prefer a graphical interface over command lines, a Java based GUI is available. The GUI provides monitoring of the FhGFS state and management of system settings in an intuitive way with no need for command line interaction. Besides managing and administrating the FhGFS installation, this tool also offers a couple of monitoring options to immediately identify performance problems within the system.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.