Is Your Code Concurrent? | Parallel Programming

Home

Columns

Parallel Programming

Is Your Code Concurrent?

Details: Written by Pavel Telegin; Published: 26 April 2006; Hits: 11013

Article Index

Page 1 of 2

It All Depends on the Dependencies

In this article, we continue our series on writing parallel programs with a discussion on how to determine if a program has concurrent sections by looking at the code. The formal way to do this is to determine flow dependence. In fact, conditions for concurrency are basically the same as for reordering instructions. When instructions can be reordered, they can be executed concurrently. Here is simple example of two Fortran statements that are concurrent.

  X = Y + 1
  Z = Y + 2

There are three basic cases when statements like this are not current.

The first is the case when data produced by one instruction is used by another subsequent instruction. This condition is called data flow dependence (or true or forward dependence). For example, in the case below the first assignment is not concurrent because the value produced by the first statement is used by the second statement.
```
  X = Y + 1 
  Z = X + 1 
```
The second case is the case when data, produced by one statement, is rewritten by another, so the wrong data may be used. This condition is called output dependence. For example, in the case below, these assignments are not concurrent because the X values could be computed in two separate locations.
```
  X = Y + 1
  Z = 2 * X
  X = Y + 2
```
The third case is the case when data produced by an instruction is used by a previous instruction. This condition is called backward dependence. For example, in the case below, the first line is not concurrent because the X value could be computed prior to its use in the first statement.
```
  Z = X + 1
  X = Y + 1
```

In truth, only the first case truly inhibits concurrency. Flow dependence is a property of the algorithm. Output and backward dependencies do not depend on the algorithm and are due to the implementation method. These types of dependencies can be avoided by renaming variables.

Sidebar One: Understanding Concurrent and Parallel

As was mentioned in the first column, the terms concurrent and parallel are often used interchangeably. In the context of this column, they are not the same thing. As we state, concurrency is a property of a program or algorithm. If parts of the program can run independently, then these parts are concurrent. If the independent parts are run on separate processors then the program is often called parallel.

For example, consider case two above. A concurrent version, where both the first and third lines are independent.

  X1 = Y + 1
  Z = 2 * X1
  X = Y + 2

Similarly, example three can be written as follow.

  Z = X1 + 1
  X = Y + 1

In some cases, using clusters it is not even necessary to rename variables. You only need to make sure that the data produced later is used in the program. For example, with output dependence, case two can be performed as the following.

process 1 
  X = Y + 1 	
  Z = 2 * X  
  receive X from process 2 

process 2
  X = Y + 2  
send X to process 1

Similarly, backward dependence could be handled as follows.

process 1
  Z = X + 1
  receive X from process 2

process 2
  X = Y + 1	
  send X to process 1

In addition to data dependencies, there is control dependence. Branches in the program cause this type of dependence. As branches cannot be reordered with other instructions, they cannot be executed in parallel. For example, the following code should not be reordered.

IF(X. GT. 0) THEN
  Y = -X
  Z = X
ELSE
  Y = X
  Z= -X
ENDIF

Of course, reordering IF-ELSE-ENDIF instructions changes the algorithm of the program. One ways to make this execute in parallel is to replicate the statements. The decision point can be rewritten as follows.

process 1
  IF(X. GT. 0) THEN
    Y = -X
  ELSE
    Y = X
  ENDIF
  send/receive

process 2
  IF(X. GT. 0) THEN
    Z = X
  ELSE
    Z= -X
  ENDIF
  send/receive

Note that this particular code, if executed in parallel would probably be very inefficient because little data is computed and a send/receive would be needed.

File operations typically are not concurrent because they use modified file pointers. Parallel read/write operations require support from the operating system. Furthermore, GOTO's and STOP instructions cannot be executed concurrently.

Checking flow dependencies can be used for algorithms as well to make sure that a concurrent algorithm is really concurrent. Analyzing data dependencies can also aid in the decision of what parts of program are concurrent. To make code efficient it is reasonable to consider sets of instructions instead of single instructions. these sets can then be considered concurrent. For example these two IF-THEN-ENDIF constructions are independent.

IF(T .GE. TS) THEN
  T = 0.
ENDIF
IF(X .GE. XS) THEN
  X = 0.
ENDIF

This code sequence can be written as;

process 1
  IF(T .GE. TS) THEN
    T = 0.
  ENDIF	

process 2
  IF(X .GE. XS) THEN
    X = 0.
  ENDIF

Concurrent Loops

The most important structure for parallelization is the DO loop (or for loop). Loops are widely used because they provide a large number of iterations and when iterations are concurrent this can produce a parallel section of code. In addition, it is easier to balance the load for parallel programs using regular objects like loops. The most important thing to check in a loop is if subsequent iterations depend on data from previous iterations. If this happens, then it is said that there are loop-carried dependencies. The distributed memory model used by clusters allows one to avoid conflicts with local data. For instance, in the example below, all iterations can be executed concurrently.

 DO I = 1, N
   TMP = X(I) * Y(I)
   Y(I) = TMP - 1.
 ENDDO

As we have pointed out in previous columns, when concurrent segments are detected it does not mean that they can be efficiently executed in parallel. Overhead caused by data transfer may limit or exceed the gain from parallelization. Consider for example the following loop.

 DO I = 1, 1000
   X(I) = X(I) + 1.
 ENDDO

If this loop is executed in parallel, then data transfers may require more time than the loop needs to complete on a single processor.

You have no rights to post comments

JComments

Main Menu

Search

Login And Newsletter

Feedburner

Subscribe Now!

Front Page RSS Feed

This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.