Benchmarking Parallel File Systems
In the last column I spent quite a bit of time explaining the design of a new set of benchmarks for parallel file systems called PIObench. I discussed critical topics such as timing, memory access patterns, disk access patters, and usage patterns. Due to constraints on how much I can write in a single column, I didn't talk about some other items such as the influence of disk caching. I won't talk about these in this column either. For those interested in things of this nature, take a look at Frank Shorter's PIObench thesis.
In this article I want to talk about building and using PIObench and actually test a parallel file system.
Building PIObench
PIObench is available from the PARL website. It is very easy to build and the README has excellent instructions. If you're having trouble with the README, the Makefile is very simple and clear. The only requirement to build PIObench is to have mpicc built and installed. This means you need an MPI built and installed. More specifically, you need an MPI that is MPI-2 conforming or at least one that has the MPI-IO support.
To build the code you just do the following:
% tar -xzf pio-bench.tar.gz % makeThe resulting code is called pio-bench.
Inputs to PIObench
Recall that the PIObench benchmark, as currently written, has a number of common spatial access patterns. These patterns are simple strided, nested strided, random strided, sequential access, segmented access, tiled, flash I/O kernel, and unstructured mesh. The benchmark has a fairly simple "language" that allows you to specify what patterns you want to run and what parameters the pattern needs. The benchmark looks for a configuration file called, oddly enough, pio-bench.conf. The benchmark then parsers the configuration, which consists of directives in the "language" and runs the tests.
The language closely resembles apache-style configuration files. The PIObench code uses the Dotconf parser. The language is used for specifying the timing configuration as well as the configuration of the specific access pattern. There are also global configuration parameters than are set using the same configuration file format.
There are six possible global timing directives: TestFile, ShowAverage, ReportOpenTime, ReportCloseTime, ReportHeaderOffset, and SyncWrites. These are placed at the top of the configuration file. I won't go over these directives in this column. The details are in Frank's thesis on the PARL website (see the sidebar for more information).
After specifying the global directives at the top of the input file,
you can use the access pattern module directives. These are fairly
simple:
The directive ModuleName is specific to the access patterns that have been compiled into the benchmark.
An example of an overall configuration file is in Listing One.
# Example configuration file for PIObench TestFile "/mnt/pvfs/file_under_test"Comments can be made by using the pound symbol "#" before a statement.ModuleName "Simple Strided(write)" ModuleReps 100 ModuleSettleTime 30 ModuleName "Nested Strided(re-write)" ModuleReps 50 ModuleSettleTime 10
You have to run the pio-bench code using the appropriate launcher
based on what MPI implementation you used. For example if you used
MPICH, you could use either C
Outputs
The outputs from PIObench are very simple. The output will give you details about what access pattern was used, what spatial pattern was used, the resulting operations performed, the time it took, and finally, the bandwidth.
Test Machine
In Frank's thesis, he presents some very comprehensive testing of a parallel file system. He ran the tests on a 16 node development cluster. Each node had dual Intel Pentium III/1 GHz processors with 1 GB of memory, two Maxtor 30 GB hard drives, an Intel Express Pro 10/100 Fast Ethernet card, and a high speed Myrinet network card. Each node was running Red Hat 8.0 with a 2.4.20 kernel. The cluster was also using PVFS (Parallel Virtual File System) 1.5.7.The cluster had four mounted file systems. Two file systems were using all 16 nodes as part of PVFS (16 IOD's). One of them used Myrinet as the interconnect and one used Fast Ethernet. The other two file systems only used 4 of the nodes for another two PVFS file systems (4 IOD's). The master node of the cluster was the MGR (meta data manager) for all four PVFS file systems. The stripe size for PVFS was set to 64 KB (kilobytes) and MPICH 1.2.5.1a was used for all of testing.
The tests were conducted a few years ago so the hardware and software are a bit old. None the less, the results indicate the usefulness of the PIObenchmark in measuring the performance of a parallel file system. It also illustrates the ease at which you can run many types of access patterns.
Test Inputs
Each of the tests were run 50 times (directive ModuleReps 50 in the module directive section of the input file). The defaults were used for the specific spatial access patterns except for the buffer size and the number of work units.
The total amount of data transferred during a test is the buffer size, which is the size of a single I/O request, multiplied by the number of I/O requests (termed work units). However, Mr Shorter was interested in determining the effects of varying the size of an individual I/O request, so the results are plotted versus buffer size rather than the total amount of data transferred.
Results
Mr Shorter's thesis had a great many results, exercising each of the access patterns that are built into the current version of PIObench. I won't discuss all of them since I don't have the room. However, I'd like to discuss three of them: Simple-Strided, Nested-Strided, and Unstructured Mesh.
For each spatial access pattern, the tests were run 50 times and the resulting time was averaged. The amount of data in an I/O request, termed the buffer size, was varied from a small number to a larger number and the number of work units was held constant. The resulting bandwidth was computed and plotted versus buffer size.
Simple Strided
The first pattern considered is the Simple Strided since it comprises about 66% of codes that use parallel I/O. The following patterns (ModuleName in the benchmark input file) were examined: "Simple Strided (read)", "Simple Strided (write)", "Simple Strided (re-read)", "Simple Strided (re-write)", and "Simple Strided (read-modify-write)". These are the five temporal access patterns discussed in the previous column.
The results for the simple strided spatial access pattern can be seen in Figure One.
Notice in the graph that the maximum write performance, about 450 MB/s, for this parallel file system (PVFS) was achieved with a buffer size of 64 KB (the same as the stripe size used in PVFS). Read performance peaked at about 490 MB/s with a buffer size of about 1 MB.