|
Page 1 of 2
One step closer to the "-gpu" option
I'm sure by now everyone has heard that you can run real code on GPU's
(Graphical Processing Units). GPUs are the graphics card in your desktop
or even the graphic engines running your game consoles at home (never
at work - right?). The potential performance improvement for codes or
algorithms that can take advantage of the GPU's programming model and
do most of their computation on the GPU is enormous. There are cases of over
a 100X performance improvement for some codes running on GPUs relative to
CPUs.
But there are some limitations to using GPUs for computation. One of the
critical limitations is that you have to take your code and rewrite it for
the GPU as in the case of
Brook+
from AMD or
OpenCL from the
Kronos Group.
Alternatively, you may have to "adapt" your C code to use some extra
functions and data types (extensions) in the case of
CUDA from
NVIDIA. Unfortunately, you just can't
take your existing code with a compiler and use a compile
option such as "-gpu" to magically build code for the GPU... or can you?
Writing Code for GPUs
I'm not sure how many people have written code or tried to write code for
GPUs (I will use GPU in place of GP-GPUs because it's easier on my carpal
tunnel symptoms), but in general it's not as easy as it appears. If you are
porting your application to GPUs then you have to take your code, understand
the algorithms reasonably well, and then determine places where you think
GPUs will shine. Then you have to either (1) rewrite the entire code, or (2)
rewrite targeted portions of the code, or (3) port the desired portions
of the code to a new language. For newly written code, you can take your
algorithm and frame it into a
SIMD (Single Instruction,
Multiple Data) context, and then write code. For old code, written lan before GPUs, or new code, writing for GPUs it's not easy. Let's
take a look at what tools are available for writing GPU code.
If you want to get really hard core you can actually write GPU code using
OpenGL. It is an API
(Application Programming Interface), or language if you will, that allows
you to write applications that run on GPUs. Originally OpenGL was designed to
be used for writing 2D and 3D computer graphics applications. But people
have discovered that you can use it to run general programs that aren't
necessarily graphically oriented. But you have to be able to "code" in
OpenGL and write your algorithms using it. There are some simple tools that
can help you get started, but in general, you have to think in graphical
concepts such as textures, shaders, etc., and be able to express your
algorithm in terms of these concepts using OpenGL. I like to think of this
as the "Assembler Language approach" to coding for GPUs. That is, you are
down in the low level bowels of the language and the hardware to effectively
write and run code on GPUs. In addition, such low-level approaches can limit
the portability of the code from one platform to another.
While it is still very difficult for non-graphical programmers to write
OpenGL code or for OpenGL coders to think about non-graphical algorithms,
there are some success stories of applications. You can try this
link or this
link
to read about some successful OpenGL applications that people have written.
Fairly early on people realized that GPUs, while showing huge potential,
were not going to have widespread adoption given that they were so
difficult to write code for. So higher level languages were developed.
There is a whole laundry list of languages and I won't go over them here.
But here is list of the higher level languages and libraries that people
are using or have used to write code for GPUs:
- Cg- Developed
by NVIDIA. Also see this link
- Shallows
- Sh
- BrookGPU
- GPU++
- PeakStream (bought by Google - nothing since then)
- Rapidmind
- Brahma- Runs on .net - MS only.
- GPULib- Plugins
for IDL and MATLAB.
- CTM- Created by AMD
(ATI). Deprecated in favor of CAL.
- CUDA- Developed by NVIDIA
- CUDA Plugins:
- PyCUDA- Python
wrappers for functions that run on the GPU.
- MATLAB plug-in for CUDA
- Similar to PyCUDA allowing you to have functions run on GPUs.
- Mathematica CUDA Plugin
- There has also been a Press Release
that Wolfram will release a version of Mathematica that can run code
on GPUs.
- Flagon- A Fortran
Library that interfaces to CUDA numerical routines that run on GPUs.
- Dr. Dobbs Magazine
recently published a very extensive series on using CUDA. Well worth
reading.
- Accelereyes- Makes a Matlab plugin
called Jacket that allows Matlab to be used with GPUs without have to write
your own MEX files.
- Brook+
- Development of BrookGPU by AMD.
- OpenCL- Very new, allows for both multi-core and GPU programming, but it is rather low level.
For all of these languages and libraries, you will need to rewrite or port
your application. The degree of severity varies depending upon the specific
option. Arguably, CUDA is one of the easiest because you can take existing C
code and add GPU code to it along with some data passing function calls to move data
to/from the host CPU to the GPU.
On the other hand, languages and libraries such as BrookGPU, Brook+, and even OpenCL,
will require you to rewrite much of your application. However, the developers have tried
to make it as close as possible to C. Some languages and libraries are available
under various open-source licenses. Others are freely available but are not open-source.
Then there are others that are commercial products.
Regardless of the language or package chosen, the amount of work that goes
into porting or rewriting varies. I view all of the previously mentioned languages
as something like Assembler+. That is, a step above something like Assembler,
but not nearly the same as C, C++, or Fortran.
What developers really want is to continue to use their current development
tools for developing for GPUs. They don't want to have to rewrite codes or
learn new languages. They may adapt their codes somewhat, perhaps a small
amount. But overall they just want to build codes for GPUs using their existing
development tools and existing code base (as much as possible).
The Evolution of GPU Tools and Developers
Developers are looking for something easy or automatic that helps
them run their code on GPUs. This is what I've been referring to as the
magic "-gpu" option. The idea is that the compiler is all-seeing and all-knowing
so that it can inspect your code, find the parts that look SIMD appropriate
code, and create a CPU/GPU binary. I think people also want to be able to eat
anything they want without gaining weight or endangering their health (at least
that's my dream). But the point is that this is an almost impossible dream.
However, we can move down the path in that direction.
This situation is not without precedence. If you've been around a few years
you may remember the rise of the vector processor. At first the developers
had to deal with trying to rewrite their codes to utilize vector processors.
At the same time, the compiler vendors had modified their compilers to help
developers recognize opportunities for vectorization as well as create good
vectorized code. Over time, developers got better at writing vector code
and the compilers became better at recognizing vector opportunities and
generating really good vector code. The results after several years were
really good developers who, on average, pretty well understood how to write
vector code and were armed with good compilers that could recognize vector
code opportunities and generate very good vector code. In addition the compilers
produced good enough performing code that developers did not have to resort
to assembler codes that they first used to achieve a good portion of the
potential performance. It took several iterations between developers and
compiler creators to get to the end result.
A better review of the history of vector compilers was written by
Michael Wolfe from PGI (The Portland Group) at this
Linux Journal.
In many ways we are following the same steps of vector processors with
GPUs. We are at the beginning of the cycle where we were with vector compilers
and code development. We have some early tools for developing codes for GPUs
and developers are just starting to develop and, more importantly, understanding
how to develop codes for GPUs. But recently, the next step in the evolution
of tools (compilers) for GPUs was recently taken by
The Portland Group.
PGI 8.0 Technology Preview
The Portland Group recently
announced that they
will have a technology preview version
of their new 8.0 compilers. The preview will be given to a restricted
group of testers initially and then expand to other developers over time. The
first customers should see the preview in early 2009.
So what is so special about this announcement from The Portland Group?
I'm glad you have asked :) What PGI has done is to add
Pragmas
or compiler directives to their compilers. These pragmas allow the compiler
to analyze the code and generate GPU
code that is then sent to the NVIDIA compiler (the Cuda compiler is part of the freely available CUDA software).
The PGI compiler continues to compile the CPU based code and link it to
CUDA built GPU binary. Then you execute the resulting combination.
Since we're all geeks here (well, at least I am), let's look at some
details at how you code GPUs now and what PGI's announcement does for
us.
|