Hits: 3944

From the Spinal Tap "-011" option working group.

Creating code for GPU accelerators is often not a simple task. In particular if one wants to convert an application to run on GPUs, the resultant code will often look very different than the original. If the application were a typical HPC code it would probably be written in Fortran, C, or C++ and use OpenMP or MPI (or both) to express parallelism. A GPU version must be rewritten to use either NVidia CUDA or OpenCL. For the average HPC user, this process can be rather daunting and present a fairly high barrier to GPU use. Fortunately, many of the popular applications (e.g. Amber) have been ported to use NVidia GPUs with great results, although they will only run on NVidia hardware. What about other open applications or user generated codes?

Compiler maker The Portland Group (now owned by NVidia) recognized this issue and developed the Accelerator Model that allows existing Fortran and C code to be augmented with "comment hints" so the compiler can then automatically create GPU accelerated code. And most importantly, adding compiler hints as comments allows the existing code structure to be preserved (i.e. Users can avoid two versions of their application.).

In order to standardize the approach, a multi-vendor organization called openACC was formed in 2011. The initial founders were hardware vendors NVidia and Cray and compiler makers Portland Group and CAPS (CAPS has since gone out of business). The recent addition of AMD and compiler vendor Pathscale brought the total number of members to twenty, which includes two US government labs and many universities. Until recently, the OpenACC effort has mostly been perceived as an NVidia Cray collaboration. As noted, NVidia purchased the Portland Group in 2013. The recent addition of AMD also gives the organization a true multi-vendor flavor. Notably absent are IBM and Intel. Of course Intel has their own compiler suite and parallel hardware accelerator line (Intel Phi) which are not as GPU centric as AMD and NVidia. In addition, it is reported that Intel would like to improve OpenMP to help compilers deal with accelerators. In either case, the use of compiler hints, or pragma directives, provides the best and easiest way to get existing (and new) codes running on GPU accelerated technologies. Indeed, the AMD APU processors will need this type of tool to take advantage of the on-die GPU processors (in HPC terms, an on-board SIMD accelerator).

Obviously, The Portland Group and Pathscale will provide OpenACC support in their compilers, but one looming question is the inclusion of OpenACC in GCC. As reported by Phoronix Labs the GCC port is not without its detractors. The current plan, however, is for OpenACC to be in GCC version 5.

Fortunately for those that are impatient, the OpenUH Compiler from University of Houston can be used to try OpenACC (version 1) applications on GPU assisted machines. OpenUH is an open source, optimizing compiler suite for C, C++ and Fortran, based on Open64. It supports a variety of architectures including x86-64, IA-32, IA-64, MIPS, and PTX (NVidia intermediate code).

OpenUH also extends the Open64 OpenMP implementation by adding support for nested parallelism and the tasking features introduced in OpenMP 3.0. To achieve portability, OpenUH is able to emit optimized C or Fortran 77 code that may be compiled by a native compiler on other platforms. The supporting run-time libraries are also portable -- the OpenMP run-time library is based on the portable Pthreads interface while the Co-array Fortran run-time library is based on the portable GASNet (or, optionally, ARMCI) communications interfaces.

Further information can be found in several OpenUH publications including:

Of particular note is the performance on the NAS Parallel Benchmarks (The NAS suite is often used to test parallel performance on a range of computational kernels). The following diagram, from the third paper listed above, shows the results the OpenUH team were able to achieve using OpenACC pragmas.

Figure One: Results for NPB-ACC versus NPB-CUDA (Taken from "NAS Parallel Benchmarks for GPGPUs using a Directive-based Programming Model," see above)

The continued development of OpenACC will enable many more GPU accelerated applications for HPC. Just as the use of OpenMP provided an easy way for applications to use multiple cores, OpenACC will offer similar functionality. Thanks to OpenUH, there is an open compiler available for those who want to try openACC.