Parallel Programming

We are going to be honest here. Writing parallel codes is not simple. You can learn the mechanics of writing MPI codes from Jeff Squyres MPI Monkey column,but what about the computer science? Or more specifically, how are you going to make sure your code runs faster on multiple processors? Join Pavel Telegin and Douglas Eadline as they explain the issues and the answers.

From the Spinal Tap "-011" option working group.

Creating code for GPU accelerators is often not a simple task. In particular if one wants to convert an application to run on GPUs, the resultant code will often look very different than the original. If the application were a typical HPC code it would probably be written in Fortran, C, or C++ and use OpenMP or MPI (or both) to express parallelism. A GPU version must be rewritten to use either NVidia CUDA or OpenCL. For the average HPC user, this process can be rather daunting and present a fairly high barrier to GPU use. Fortunately, many of the popular applications (e.g. Amber) have been ported to use NVidia GPUs with great results, although they will only run on NVidia hardware. What about other open applications or user generated codes?

Yes, mowing ones lawn and HPC have much in common

In many of my previous columns I mentioned Amdahl's Law -- the golden rule of parallel computing. Before you click away, rest assured I have no intention of talking specifically about Amdahl’s Law and I promise not to place a single equation or derivation in this column. Often times people are put off by Amdahl’s law. Such discussions usually start with an equation and talk of the limit as N goes to infinity. Not to worry. There are no formulas, no esoteric terms (sorry, no big words), just the skinny on the limits of parallel computing. I’ll even go one further, I’ll hardly mention parallel computers, multi-core, and other such over worked topics. In this article, I’ll discuss lawn care.

You really should meet Julia

A long time ago, BASIC was "the language" in the PC world. There were other languages of course, but BASIC was well, "basic" and it was the only thing beyond machine code for many early PC enthusiasts. The name was an acronym for "Beginner's All-purpose Symbolic Instruction Code." New users could easily start writing programs after learning a few commands. It was fun and easy. Many scientists and engineers taught themselves BASIC. Of course, many would argue at the time real scientific programs were written in BASIC's big brother FORTRAN, but FORTRAN was a different world in terms of development cycle and hardware. There was this thing called a compiler that had to be run before you could execute your program. BASIC on the other hand seemed to keep track of your code and would just "RUN" whenever you wanted. Of course you might get some errors, but the "edit, run, edit" cycle was rather short and allowed one to easily play with the computer.

A hard to swallow conclusion from the table of cluster

A confession is in order. The last two (1, 2) installments of this column have been a sales pitch of sorts. If you believe some of the things I talked about, large clusters will break, applications will need to tolerate failure and be easy to write, then you may agree that dynamic programming algorithms are one method to solve these problem. The next question is, how do implement these ideas?

The answer is the part many may find distasteful. If you are one of the brave few who take pause to think about how we are going achieve pervasive cluster (parallel) computing, then take a bite of this column. The rest of you weenies should at least nibble at the edges.

Dynamic Parallel Execution: Losing control at the high end

In a past column, I talked about programming large numbers of cluster nodes. By large, I mean somewhere around 10,000. If you have been following along, at the end of the article I had promised to mention some real alternatives to MPI and even suggest some wild ideas. I plan to keep my promise, however, I wanted to take a slight detour this month and develop the solution a bit further. One point of note before we begin. To keep things simple, I will refer to cluster nodes as if they were a single processing unit.


Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.


Share The Bananas

Creative Commons License
©2005-2016 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.