Article Index

Another way to take advantage of PVFS is to use MPI-IO which uses PVFS1 as the underlying file system. MPI-IO stands for Message Passing Interface-Input/Output which is defined in the MPI-2 standard. MPI-IO is a response to the call for a parallel I/O capability that is part of the MPI standard. There are many features of MPI-IO that allow the MPI processes to participate in the I/O process rather than more traditional approaches of having the rank 0 process perform all I/O or splitting the data so each processes has it's own set of data. While one might think that the native PVFS1 library interface would result in the faster I/O, using MPI-IO features such as noncontiguous accesses and collective I/O coupled with PVFS1 allows you to achieve some very high I/O rates. For example, ROMIO, an implementation of MPI-IO from Argonne National Laboratory, can be built to use PVFS1 and provide MPI-IO functions for the MPICH and LAM/MPI packages. The semantics of the MPI-IO functions were designed to be fairly close to the UNIX/POSIX I/O semantics. There are C versions of the functions as well as Fortran versions of the functions. While discussing MPI-IO is a bit beyond the scope of this column, let's take a quick look at some of the MPI-IO functions.

Opening a file using MPI-IO is fairly similar to the open() function.

int MPI_File_open(MPI_COMM comm, 
                  char *pathname, 
                  int amode, 
                  MPI_Info info, 
                  MPI_File *fh);
In this function all MPI processes call this function. The function is passed the MPI communicator (MPI_COMM_WORLD is usually passed), the pathname to the file, which should be a path to a PVFS mounted file system (although it doesn't have to be), the mode of the file access (the MPI-IO datatypes for used for this argument), and a "hint" argument which is type MPI_Info. The function returns a file handle that is of type MPI_File to the file. Then each MPI process can perform IO using this file handle. Here are the basic MPI-IO functions:
int MPI_File_open(MPI_COMM comm, 
                  char *pathname, 
                  int amode, 
                  MPI_Info info, 
                  MPI_File *fh);

int MPI_File_seek(MPI_File fh, 
                  MPI_Offset offset, 
                  int whence);

int MPI_File_read(MPI_File fh, 
                  void *buf, 
                  int count,
                  MPI_Datatype datatype, 
                  MPI_Status *status);

int MPI_File_write(MPI_File fh, 
                   void *buf, 
                   int count, 
                   MPI_Datatype datatype, 
                   MPI_Status *status);

int MPI_File_close(MPI_File *fh);

Using PVFS2 in Your Codes

PVFS2 is a new PFS developed by the same team that built the original version of PVFS. The developers found that modifying PVFS1 to add new networking protocols or new storage systems was very time consuming and difficult to maintain. Also, scaling PVFS1 to thousands and even tens of thousands of nodes was likely to be a problem. PVFS2 was a solution to these problems. The developers took the opportunity to redesign PVFS1 to make it easier to add new networking protocols and new storage techniques.

In redesigning PVFS the developers have taken the opportunity to rethink the APIs for accessing PVFS2. The developers found that not many people used the "UNIX/POSIX-like" I/O compatibility feature since the speed gain was fairly small due to the I/O not being optimized. Also people were not willing to port their applications to the PVFS-specific UNIX-like library just to get a little performance gain, particularly if they wanted to run the code somewhere else. Consequently, the PVFS2 developers are just focusing on the VFS support and MPI-IO support as the application API's at this time.

You have no rights to post comments

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.