|
Page 2 of 2
PVFS-Users: Performance issue with PVFS
PVFS is a Parallel Virtual FileSystem that combines the hard drive space
on compute nodes or dedicated I/O nodes to form a single filesystem. PVFS
is built on top of any standard Linux filesystem. There are various ways
to access the filesystem including the traditional Linux filesystem
commands and ROMIO (I/O for MPI). More importantly, PVFS can greatly
increase I/O operations compared to traditional filesystems.
While a bit old, there was a discussion on the PVFS-Users mailing list
that started on the 16th of July 2003 by Craig Tierney, about some performance
issues he was having on two PVFS systems with read performance being
significantly slower than write. There was some initial discussion about
tuning some of the parameters in PVFS, such as the stripe size, and tuning
some parameters on the nodes to improve performance. It had a small affect
on the read performance. Then Craig and Rob Ross discovered that by
switching the client code from using mmap() calls to read data, to using
sendfile() calls, the performance greatly increased. These two function,
mmap() and sendfile() are functions within Linux that can be
used for reading
or writing data. They are low-level functions that PVFS uses. The performance
boost was very good resulting in about 50% improvement for one cluster,
and almost 300% improvement for another cluster. In fact, the PVFS team
made this an option in the current version of PVFS so people can tune
their PVFS setup for maximum performance. The team is also looking at
adding some tuning suggestions to their FAQ (Frequently Asked Questions)
and perhaps writing a code that can be used to measure PVFS performance
as a sort of "baseline" to help people tune PVFS. Of course, the ultimate
code for tuning is the user's application.
An important point about open cluster software like PVFS is the fact that
it can be tuned for maximum performance based on the user's application.
PVFS-Users: Size of PVFS Clusters
On August 11th 2003, Nathan Poznick asked the PVFS-Users mailing list what
the largest PVFS (Parallel Virtual FileSystem) cluster was, what was
the configuration, and how was the performance? Rob Ross responded that
he knew of systems with 10's of servers, 100's of clients, and 10's of
TeraBytes (TB) of storage that have performed quite well.
He indicated
that over Myrinet, they were able to achieve about 3.5 GigaBytes (GB)
per second aggregate bandwidth using IDE (Integrated Drive Electronics)
disks. Troy Baer responded that their current configuration has 16 I/O
nodes, with just a little under 10 TB of raw disk. They have achieved
sustained performance of about 1.4 to 1.6 GB/sec for simple tests
(e.g. ROMIO perf) and about 100-400 MegaBytes (MB) per second for real
applications such as the ASCI Flash I/O code or the NAS BTIO code. He
included a link to a paper discussing the results. Crag Tierney also
discussed a system he was sizing for a bid on a project that used multiple
FC (Fibre Channel) arrays to keep the number of IOD (Input/Output Daemon)
nodes to a minimum. He said that he has seen performance of 150 MB/sec
with nodes that have a reasonably fast disk. In his opinion, he did not
see and reason that a PVFS filesystem could not be designed for 10 GB/sec
performance.
Keep in mind that the numbers mentioned are how fast parallel programs
read or write data to a PVFS File System and are a function of I/O
server, interconnect, and cluster nodes. Searching for articles on the
Internet will show how PVFS was used to achieve over 1 GB/sec performance
on simple clusters. If I/O performance is a crucial part of your
application or if you just want to test PVFS, then visit the
website. The code is Open-Source
and has been used in production at several sites.
This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.
Jeff Layton has been a cluster enthusiast since 1997 and spends far too
much time reading mailing lists.
Comment on this article
You must login to leave comments...
Other Visitors Comments
There are no comments currently....
|