Home
Learning About Clusters
Programming Clusters
Administering Clusters
Benchmarking Clusters
File Systems for Clusters
Cluster Applications/Grid
Cluster News
Site Map
 
    Home arrow Columns arrow Beowulf List arrow Autonegotiation, Diskless, PVFS, Multicast Discussions
Search
Monkey Support
Main Menu
Home
News
Features
Columns
Reviews
Links
FAQ's
Contact
Site Information
Projects
Conference Reports
Cluster Tweaks
Site Map
Add This Article
Login Form





Lost Password?
No account yet? Register

Cluster Agenda

Cluster Builder

Unicef Haiti Earthquake


Autonegotiation, Diskless, PVFS, Multicast Discussions Print E-mail
Written by Jeff Layton   
Sunday, 04 September 2005
Article Index
Autonegotiation, Diskless, PVFS, Multicast Discussions
Page 2

PVFS-Users: Performance issue with PVFS

PVFS is a Parallel Virtual FileSystem that combines the hard drive space on compute nodes or dedicated I/O nodes to form a single filesystem. PVFS is built on top of any standard Linux filesystem. There are various ways to access the filesystem including the traditional Linux filesystem commands and ROMIO (I/O for MPI). More importantly, PVFS can greatly increase I/O operations compared to traditional filesystems.

While a bit old, there was a discussion on the PVFS-Users mailing list that started on the 16th of July 2003 by Craig Tierney, about some performance issues he was having on two PVFS systems with read performance being significantly slower than write. There was some initial discussion about tuning some of the parameters in PVFS, such as the stripe size, and tuning some parameters on the nodes to improve performance. It had a small affect on the read performance. Then Craig and Rob Ross discovered that by switching the client code from using mmap() calls to read data, to using sendfile() calls, the performance greatly increased. These two function, mmap() and sendfile() are functions within Linux that can be used for reading or writing data. They are low-level functions that PVFS uses. The performance boost was very good resulting in about 50% improvement for one cluster, and almost 300% improvement for another cluster. In fact, the PVFS team made this an option in the current version of PVFS so people can tune their PVFS setup for maximum performance. The team is also looking at adding some tuning suggestions to their FAQ (Frequently Asked Questions) and perhaps writing a code that can be used to measure PVFS performance as a sort of "baseline" to help people tune PVFS. Of course, the ultimate code for tuning is the user's application.

An important point about open cluster software like PVFS is the fact that it can be tuned for maximum performance based on the user's application.

PVFS-Users: Size of PVFS Clusters

On August 11th 2003, Nathan Poznick asked the PVFS-Users mailing list what the largest PVFS (Parallel Virtual FileSystem) cluster was, what was the configuration, and how was the performance? Rob Ross responded that he knew of systems with 10's of servers, 100's of clients, and 10's of TeraBytes (TB) of storage that have performed quite well. He indicated that over Myrinet, they were able to achieve about 3.5 GigaBytes (GB) per second aggregate bandwidth using IDE (Integrated Drive Electronics) disks. Troy Baer responded that their current configuration has 16 I/O nodes, with just a little under 10 TB of raw disk. They have achieved sustained performance of about 1.4 to 1.6 GB/sec for simple tests (e.g. ROMIO perf) and about 100-400 MegaBytes (MB) per second for real applications such as the ASCI Flash I/O code or the NAS BTIO code. He included a link to a paper discussing the results. Crag Tierney also discussed a system he was sizing for a bid on a project that used multiple FC (Fibre Channel) arrays to keep the number of IOD (Input/Output Daemon) nodes to a minimum. He said that he has seen performance of 150 MB/sec with nodes that have a reasonably fast disk. In his opinion, he did not see and reason that a PVFS filesystem could not be designed for 10 GB/sec performance.

Keep in mind that the numbers mentioned are how fast parallel programs read or write data to a PVFS File System and are a function of I/O server, interconnect, and cluster nodes. Searching for articles on the Internet will show how PVFS was used to achieve over 1 GB/sec performance on simple clusters. If I/O performance is a crucial part of your application or if you just want to test PVFS, then visit the website. The code is Open-Source and has been used in production at several sites.

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists.

Comment on this article
You must login to leave comments...


Other Visitors Comments
There are no comments currently....

Last Updated ( Tuesday, 09 January 2007 )
 
< Prev Article
Appro International
 

Creative Commons License
  ©2005-2008 Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.