Form the Long Live Fortran department
Two recent stories should bode well for the HPC market. First, OpenMP.org released the new OpenMP 4.0 Specification. This specification includes many new features including support for accelerators (GPUs and Intel Phi) and SIMD math units on processors (and much more). A longer description can be found below (with a link to the specification page). The OpenMP API provides a set of "pragma" comments that can be placed into existing Fortran and C/C++ code. The original source code remains unmodified and usable on other systems.
The second story is that GPU/HPC vendor NVidia has bought Portland Group. Portland Group (PGI), known for its high performance compilers, has developed Fortran and C/C++ compilers that can directly address NVidia GPUs (AMD GPUs were mentioned as well, but it assumed this support is going away.) Their technology which is similar to the OpenMP approach has brought GPU performance to many existing applications. While some consider the NVidia purchase as a reduction of choice it the market, it probably signals a much stronger move toward standardization via OpenMP.
The announcement of OpenMP 4.0 means that end-users can continue to operate at the "Fortran and C/C++" level and not have to look for custom programming methods to use new hardware. In other words, they don't need to use languages like CUDA or OpenCL, which may require large amounts of re-programming. Indeed, the NVidia acquisition of PGI is a signal that NVidia believes the future of GPU programming for HPC lies in Fortran and C/C++. Coupled with the OpenMP announcement one might conclude that CUDA may be taking a back seat to the stalwart traditional compilers. A detailed OpenMP announcement follows.
The OpenMP Consortium has released OpenMP API 4.0, a major upgrade of the OpenMP API standard language specifications. Besides several major enhancements, this release provides a new mechanism to describe regions of code where data and/or computation should be moved to another computing device.
Bronis R. de Supinski, Chair of the OpenMP Language Committee, stated that “OpenMP 4.0 API is a major advance that adds two new forms of parallelism in the form of device constructs and SIMD constructs. It also includes several significant extensions for the loop-based and task-based forms of parallelism already supported in the OpenMP 3.1 API.”
The 4.0 specification is now available on the OpenMP Specifications page.
With this release, the OpenMP API specifications, the de-facto standard for parallel programming on shared memory systems, continues to extend its reach beyond pure HPC to include DSPs, real time systems, and accelerators. The OpenMP API aims to provide high-level parallel language support for a wide range of applications, from automotive and aeronautics to biotech, automation, robotics and financial analysis. New features in the OpenMP 4.0 API include:
- Support for accelerators. The OpenMP 4.0 API specification effort included significant participation by all the major vendors in order to support a wide variety of compute devices. OpenMP API provides mechanisms to describe regions of code where data and/or computation should be moved to another computing device. Several prototypes for the accelerator proposal have already been implemented.
- SIMD constructs to vectorize both serial as well as parallelized loops. With the advent of SIMD units in all major processor chips, portable support for accessing them is essential. OpenMP 4.0 API provides mechanisms to describe when multiple iterations of the loop can be executed concurrently using SIMD instructions and to describe how to create versions of functions that can be invoked across SIMD lanes.
- Error handling. OpenMP 4.0 API defines error handling capabilities to improve the resiliency and stability of OpenMP applications in the presence of system-level, runtime-level, and user-defined errors. Features to abort parallel OpenMP execution cleanly have been defined, based on conditional cancellation and user-defined cancellation points.
- Thread affinity. OpenMP 4.0 API provides mechanisms to define where to execute OpenMP threads. Platform-specific data and algorithm-specific properties are separated, offering a deterministic behavior and simplicity in use. The advantages for the user are better locality, less false sharing and more memory bandwidth.
- Tasking extensions. OpenMP 4.0 API provides several extensions to its task-based parallelism support. Tasks can be grouped to support deep task synchronization and task groups can be aborted to reflect completion of cooperative tasking activities such as search. Task-to-task synchronization is now supported through the specification of task dependency.
- Support for Fortran 2003. The Fortran 2003 standard adds many modern computer language features. Having these features in the specification allows users to parallelize Fortran 2003 compliant programs. This includes interoperability of Fortran and C, which is one of the most popular features in Fortran 2003.
- User-defined reductions. Previously, OpenMP API only supported reductions with base language operators and intrinsic procedures. With OpenMP 4.0 API, user-defined reductions are now also supported.
- Sequentially consistent atomics. A clause has been added to allow a programmer to enforce sequential consistency when a specific storage location is accessed atomically.
"This represents collaborative work by many of the brightest in industry, research, and academia, building on the consensus of 26 members. We strive to deliver high-level parallelism that is portable across 3 widely-implemented common General Purpose languages, productive for HPC and consumers, and delivers highly competitive performance. I want to congratulate all the members for coming together to create such a momentous advancement in parallel programming, under such tight constraints and industry challenges. With this release, the OpenMP API will move immediately forward to the next release to bring even more usable parallelism to everyone." – Michael Wong, CEO OpenMP ARB.