Parallel Programming

We are going to be honest here. Writing parallel codes is not simple. You can learn the mechanics of writing MPI codes from Jeff Squyres MPI Monkey column,but what about the computer science? Or more specifically, how are you going to make sure your code runs faster on multiple processors? Join Pavel Telegin and Douglas Eadline as they explain the issues and the answers.

But it does not stop me from asking

Fifteen years ago I wrote a short article in a now defunct parallel computing magazine (Parallelogram) entitled "How Will You Program 1000 Processors?" Back then it was a good question that had no easy answer. Today, it is still a good question that still has no easy answer. Except now it seems a bit more urgent as we step into the "mulit-core" era. Indeed, when I originally wrote the article, using 1000 processors was a far off, but real possibility. Today, 1000 processors are a reality for many practitioners of HPC. As dual cores hit the server rooms, effectively doubling the processor counts, many more people will be joining the 1000P club very soon.

Static or Dynamic? It is a matter of balance

Now that we know how to identify parallel parts of our program, the question is now what to do with this knowledge. Or, how do you write a parallel program. To answer this question, we will discuss what the structure of a parallel program may look like. Programs can be organized in different ways. We already discussed SPMD (Single Program Multiple Data) and MPMD (Multiple Programs Multiple Data) models. SPMD and MPMD represents the way a program looks from the point of view of the cluster. Note, that using a MPMD model with MPI an "app" or "procgroup" file will be needed to start different programs on cluster nodes. Let's see what the programs look from the implementation standpoint.

It All Depends on the Dependencies

In this article, we continue our series on writing parallel programs with a discussion on how to determine if a program has concurrent sections by looking at the code. The formal way to do this is to determine flow dependence. In fact, conditions for concurrency are basically the same as for reordering instructions. When instructions can be reordered, they can be executed concurrently. Here is simple example of two Fortran statements that are concurrent.

More parallel secrets; speedup, efficiency and your code

In the previous column, we found that parallel programs for clusters have very subtle differences and their efficiency requires careful examination of the code. In this article, we will see what a typical parallel program looks like and how it is executed on a cluster. Be warned, however, there is a bit of gentle mathematics in this column. It will not hurt, we promise.

Want to parallelize your code? Before you dive in, you might want to test the waters.

It can be said be said that writing parallel code is easy. It can also be said that getting your code to run fast and produce correct answers is a bit more difficult. With parallel computing, we don't have the luxury of a compiler optimizing for us, so we have to do the work. In this column we are going to look at some of the fundamentals and hopefully get you thinking about some of the issues that are critical for your success.

Clusters are organized in such a way that there is a set of independent computers connected by a communication network. The difference between clusters and multiprocessor computers (SMP systems) is the memory structure. Multiprocessors contain several CPUs which are connected in someway to memory. They can be connected by a bus or crossbar, but the most important thing is that all processors are connected to all the memory. This configuration is called shared memory. With a cluster, each node has its own memory that is local to cluster node. Other nodes can only access this memory through a communication network. This configuration is called distributed memory and is what makes a programming a cluster different from shared memory computer. Accessing the memory of other nodes results in substantial time delay. And because of this delay, a program that runs well on a multiprocessor will not necessarily run well on a cluster. Of course, each node can be and often is a multiprocessor, but the key to cluster programming is programming distributed memory systems because the limiting factor is the communication speed.


Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.


Share The Bananas

Creative Commons License
©2005-2016 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.