A Tool for Cluster Performance Tuning and Optimization
 Published on Wednesday, 14 September 2005 20:00
 Written by Douglas Eadline
 Hits: 26108
Article Index
In Case of Problems
You should have minimal or no problems with the single machine tests. As more machines become involved with the tests, there is room for more configuration errors to arise. If a test does not run, check the "test_name.log" file in the log directory (the default is bpslogs). In the case of the NAS tests, the results are in the form npb.COMPILER.MPI.CLASS.PROCESSORS.In general, if you have problems with a test it may be best to examine the results in the log file. In the case of the NAS suite, the "k" option will keep the npb tests directory in the log directory so you can run the tests directly if you have problems. There is a script called run_suite in the npb directory that will run the tests (run_suite h ). Also the README.bps file in the npb directly should provide more information on how the tests are run and how to resolve possible problems.
Working with the NAS Benchmark Suite
The NAS suite will probably produce the most problems for end users. The main script run_suite is designed to "wrap" around and hide the various issues with the different MPI's and compilers. While run_suite does an adequate job, it certainly can not predict the potential software environments on a cluster. The NAS suite is run from the command line. The following options are required. If you are using run_suite directly you need to list the machine names in the npb/cluster/machines file (one per line). This file is used by MPI to start your programs. The options are:Usage: ./run_suiteOptions: v verbose output from make stage (default=make.log) c <compiler> compiler (gnu/pgi/intel) n <processors> number of processors t <test> test size (A,B,C,S) m <mpi> mpi version(lam,mpich,mpipro,dummy) o only build programs h show this help To run on a single CPU use: 'c gnu n 1 t S m dummy'
If you have problems producing the binary files, consult the make.log file for a complete listing for the make process. Often an error will cause all the builds to fail, so fix one problem at time and rerun the build.
Future Plans
The BPS suite is showing its age. Virtually all the functionality of the BPS suite is being converted into the CMBP (Cluster Monkey Benchmarking Project).Acknowledgments
I wish to thank and acknowledge all the authors of the tests suites used in this package. See Sidebar Two (below) for more information on each test.
Sidebar One: Description of the NAS tests 
BT is a simulated CFD application that uses an implicit algorithm to solve 3dimensional (3D) compressible NavierStokes equations. The finite differences solution to the problem is based on an Alternating Direction Implicit (ADI) approximate factorization that decouples the x, y, and z dimensions. The resulting systems are BlockTridiagonal of 5x5 blocks and are solved sequentially along each dimension. SP is a simulated CFD application that has a similar structure to BT. The finite differences solution to the problem is based on a BeamWarming approximate factorization that decouples the x, y, and z dimensions. The resulting system has scalar Pentadiagonal bands of linear equations that are solved sequentially along each dimension. LU is a simulated CFD application that uses symmetric successive overrelaxation (SSOR) method to solve a seven block diagonal system resulting from finite difference discretization of the NavierStokes equations in 3D by splitting to into block Lower and Upper triangular systems. FT contains the computational kernel of a 3D fast Fourier Transform (FFT)based spectral method. FT performs three one dimensional (1D) FFT's, one for each dimension. MG uses a Vcycle MultiGrid method to compute the solution of the 3D scalar Poisson equation. The algorithm works continuously on a set of grids that are made between coarse and fine. It tests both short and long distance data movement. CG uses a Conjugate Gradient method to compute an approximation to the smallest eigenvalue of a large, sparse, unstructured matrix. This kernel tests unstructured grid computations and communications by using a matrix with randomly generated locations of entries. EP is an Embarrassingly Parallel benchmark. It generates pairs of Gaussian random deviates according to a specific scheme. The goal is to establish the reference point for peak performance of a given platform. EP is almost independent of the interconnect as communication is minimal. IS in a parallel integer sort algorithm that is very sensitive to latency of the interconnect. 
{mosgoogle right}
Sidebar Two: Additional Benchmark Information 

 << Prev Page
  Next Page