[Beowulf] OpenMP wierdness on dual AMD 2350 box w/ SL5.2 x86_64

Nathan Moore ntmoore at gmail.com
Wed Nov 26 13:32:22 EST 2008


After the help last week on openmp, I got inspired and bought a dual-quad
opteron machine for the department to show 8-way scaling for my students
("Hey, its much cheaper than something new in the optics lab", my dept.
chair laughed).

I've been working on said machine over the past few days and found something
really weird in an OpenMP example program I descrobed to the list.

The machine is a dual-proc AMD Opteron 2350, Tyan n3600T (S2937) mainboard,
w/ 8GB ram.  Initially, I installed the i386 version of Scientific Linux
5.2, but then realized that only half of the RAM was usable, and
re-installed SL5.2 x86_64 this morning.

The example program is appended to the end of this email.  Again, it is a
2-d finite-difference solution to the laplace equation (the context being to
"predict" lightning strikes based on the potential between the ground and
some clouds overhead).

The program scales beautifully up to OMP_NUM_THREADS~6 or 7, but when I set
the number of threads to 8, something weird happens, and instead of taking
the normal 241 iterations to converge, the program converges after 1 step.
This reeks of numerical instability to me, but my programming experience in
x86_64 is very limited.

I'm using gfortran, with the simple compile string,
     gfortran clouds_example_OpenMP.f90 -m64 -fopenmp

Any insight into what obvious mistake I'm making would be wonderful!

The stability of the output seems erratic to me.  Sometimes when
OMP_NUM_THREADS=7 the result converges normally after 241 iterations and at
other times, the result converges after 1 iteration (indicating some sort of
problem with hardware???)

This didn't occur yesterday when the machine was running SL5.2, i386.




Simulation Output:

[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=1
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
 Hello World from thread           0
 There are           1 threads
...
convergence criteria is \Delta V <   0.250000003725290
  iterations necessary,          241
  initialization time,            0
  simulation time,           57




[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=2
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
 Hello World from thread           0
 Hello World from thread           1
 There are           2 threads
 ...
convergence criteria is \Delta V <   0.250000003725290
  iterations necessary,          241
  initialization time,            0
  simulation time,           28




[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=4
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
 Hello World from thread           3
 Hello World from thread           1
 Hello World from thread           0
 Hello World from thread           2
 There are           4 threads
 ...
convergence criteria is \Delta V <   0.250000003725290
  iterations necessary,          241
  initialization time,            0
  simulation time,           14




[nmoore at aykroyd clouds]$ OMP_NUM_THREADS=8
[nmoore at aykroyd clouds]$ export OMP_NUM_THREADS
[nmoore at aykroyd clouds]$ ./a.out
 Hello World from thread           2
 ...
convergence criteria is \Delta V <   0.250000003725290
  iterations necessary,            1
  initialization time,            0
  simulation time,            0

Code listing:

nmoore at aykroyd clouds]$ cat clouds_example_OpenMP.f90
!
!
use omp_lib
!
IMPLICIT NONE
integer,parameter::Nx=2000
integer,parameter::Ny=2000
real*8 v(Nx,Ny), dv(Nx,Ny)
integer boundary(Nx,Ny)
integer i,j,converged,i1,i2
integer t0,t1,t2
real*8 convergence_v, v_cloud, v_ground, max_dv
real*8 bump_P,old_v
real*8 Lx,Ly,dx,dy,v_y
!
real*8 lightning_rod_center, lightning_rod_height
!
real*8 house_center, house_height, house_width
integer num_iterations
!
integer:: id, nthreads
!$omp parallel private(id)
id = omp_get_thread_num()
write (*,*) 'Hello World from thread', id
!$omp barrier
if ( id == 0 ) then
        nthreads = omp_get_num_threads()
        write (*,*) 'There are', nthreads, 'threads'
end if
!$omp end parallel

! initial time
t0 = secnds(0.0)

dx =0.1d0 ! differential lengths, m
dy =0.1d0
Lx = Nx*dx ! system sizes, m
Ly = Ny*dy

print *,"\nSimulation has bounds:\n\tX: 0,",Lx,"\n\tY: 0,",Ly
print *,"\tNx = ",Nx,"\n\tNy = ",Ny
print *,"\tdx = ",dx,"\n\tdy = ",dy

v_cloud = -10000.d0     ! volts
v_ground = 0.d0

! initialize the the boundary conditions
!
! first, set the solution function (v), to look like a
! parallel-plate capacitor
!
!       note that there is one large parallel section and several
!       parallel do's inside that region
!$OMP PARALLEL
!
!$OMP DO
!$OMP& SHARED(Nx,Ny,boundary,v_cloud,v_ground,Ly,dy,v)
!$OMP& PRIVATE(i,j)
do j=1,Ny
        do i=1,Nx
                boundary(i,j)=0
                v(i,j) = v_ground + (v_cloud-v_ground)*(j*dy/Ly)
        end do
end do
!$OMP END DO
!
!$OMP DO
!$OMP& SHARED(Nx,Ny,boundary)
!$OMP& PRIVATE(i)
do i=1,Nx
        boundary(i,1)=1         ! we need to ensure that the edges of
        boundary(i,Ny)=1        ! the domain are held as boundary
end do
!$OMP END DO
!
!$OMP DO
!$OMP& SHARED(boundary,Nx)
!$OMP& PRIVATE(j)
do j=1,Ny
        boundary(1,j)=1
        boundary(Nx,j)=1
end do
!$OMP END DO
!$OMP END PARALLEL


! set up an interesting feature on the lower boundary
!       do this in parallel with SECTIONS directive
!
!$OMP PARALLEL
!$OMP& SHARED(v,boundary,Nx,Ny,dx,dy,Lx,Ly,lightning_rod_height)
!$OMP& PRIVATE(lightning_rod_center,house_center,house_height,house_width))
!$OMP SECTIONS

!$OMP  SECTION
! Lightning_rod
lightning_rod_center = Lx*0.6d0
lightning_rod_height = 5.0d0
call
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)

!$OMP  SECTION
lightning_rod_center = Lx*0.5d0
call
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)

!$OMP  SECTION
lightning_rod_center = Lx*0.7d0
call
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)

!$OMP  SECTION
! house
house_center = 0.4d0*Lx
house_height = 5.0d0
house_width = 5.0d0
call
create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)

!$OMP END SECTIONS
!$OMP END PARALLEL

! initialization done
t1 = secnds(0.0)





! main solution iteration
!
! repeat the recursion relation until the maximum change
! from update to update is less than a convergence epsilon,
convergence_v = (0.05)*dabs(v_ground-v_cloud)/(1.d0*Ny)
print *,"\nconvergence criteria is \Delta V < ",convergence_v
num_iterations = 0

!
! iteration implemented with a goto  or a do-while
converged=0
do while( converged .eq. 0)

        converged = 1
        num_iterations = num_iterations + 1
        !$OMP PARALLEL
        !$OMP DO
        !$OMP& PRIVATE(i,j)
        !$OMP& SHARED(Ny,Nx,dv,v,boundary))
        do j=2,(Ny-1)
                do i=2,(Nx-1)
                        dv(i,j) =
0.25d0*(v(i-1,j)+v(i+1,j)+v(i,j+1)+v(i,j-1)) - v(i,j)
                        dv(i,j) = dv(i,j)*(1.d0-DFLOAT(boundary(i,j)))
                end do
        end do
        !$OMP END DO

        max_dv = 0.d0
        !$OMP DO
        !$OMP& PRIVATE(i,j)
        !$OMP& SHARED(NX,NY,dv,v))
        !$OMP& REDUCTION(MAX:max_dv)
        do j=2,(Ny-1)
                do i=2,(Nx-1)
                        v(i,j) = v(i,j) + dv(i,j)
                        if(dv(i,j) .gt. max_dv) then
                                max_dv = dv(i,j)
                        endif
                end do
        end do
        !$OMP END DO
        !$OMP END PARALLEL

        if(max_dv .ge. convergence_v) then
                converged = 0
        endif

end do







! simulation finished
t2 = secnds(0.0)

print *," iterations necessary, ",num_iterations
print *," initialization time, ",t1-t0
print *," simulation time, ",t2-t1


open(unit=10,file="v_output.dat")
write(10,*) "# x\ty\tv(x,y)"
do j=1,Ny
        !do i=1,Nx
        ! skipping the full array print to save execution time
        ! the printed data file is normally sent to gnuplot w/ splot
        i=10
                write (10,*) i*dx,j*dy,v(i,j)
        !enddo
        write (10,*)" "
end do
close(10)


stop
end




subroutine
create_lightning_rod(v_ground,lightning_rod_center,lightning_rod_height,dx,dy,Nx,Ny,v,boundary)
IMPLICIT NONE
integer Nx, Ny,j,boundary(Nx,Ny)
integer j_limit
integer index_lightning_rod_center
real*8 v(Nx,Ny)
real*8 lightning_rod_center,lightning_rod_height
real*8 dx, dy, v_ground

index_lightning_rod_center = lightning_rod_center/dx
j_limit = lightning_rod_height/dy
do j=1,j_limit
        v(index_lightning_rod_center,j) = v_ground
        boundary(index_lightning_rod_center,j) = 1
end do

print *,"Created a lightning rod of height ",lightning_rod_height
print *,"\ty_index ",j_limit
print *,"\tx-position ",lightning_rod_center
print *,"\tx_index ",index_lightning_rod_center


end subroutine







subroutine
create_house(v_ground,house_center,house_height,house_width,dx,dy,Nx,Ny,v,boundary)
IMPLICIT NONE
integer Nx, Ny, boundary(Nx,Ny)
real*8 v(Nx,Ny)
real*8 v_ground, dx, dy
integer i,j,i_limit,j_limit, index_house_center
real*8 house_center,house_height,house_width

index_house_center = house_center/dx
i_limit = 0.5d0*house_width/dx
j_limit = house_height/dy
do j=1,j_limit
        do i=(index_house_center-i_limit),(index_house_center+i_limit)
                v(i,j) = v_ground
                boundary(i,j) = 1
        end do
end do

print *,"Created a house of height ",house_height
print *,"\ty_index ",j_limit
print *,"\twidth ",house_width
print *,"\thouse bounds:
",index_house_center-i_limit,index_house_center+i_limit

end subroutine

-- 
- - - - - - -   - - - - - - -   - - - - - - -
Nathan Moore
Assistant Professor, Physics
Winona State University
AIM: nmoorewsu
- - - - - - -   - - - - - - -   - - - - - - -



-- 
- - - - - - -   - - - - - - -   - - - - - - -
Nathan Moore
Assistant Professor, Physics
Winona State University
AIM: nmoorewsu
- - - - - - -   - - - - - - -   - - - - - - -

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20081126/3d2c4fa8/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


More information about the Beowulf mailing list