Performance Variations using MPI/Myrico

Steffen Persvold sp at
Fri Apr 27 12:02:38 EDT 2001

Patrick Geoffray wrote:
> Steffen Persvold wrote:
> > Hmm, the NAS application runs in userspace and since this inner loop
> > (FFT code) runs without any communication with other nodes, why would a
> > SSE patched kernel improve it's memcpy performance. I would believe that
> > the memcpy calls in the FFT code was either inlined by the compiler, or
> > that a call to libc's memcpy was made. It shouldn't involve any
> > system (kernel) time at all, right ??
> Hi Steffen,
> Yes, the NAS FT code does not use the "memcpy()" system call. The copy
> step of the FFT is explicit (loop of assignments) and the PGI compiler
> is smart enough to use SSE prefetching to optimize this part of the code
> if SSE is available. But without a specific patch, the Linux kernel does
> not enable the SSE support (basically the kernel has to save the FP and
> the SSE registers during context switching), so the SSE optimization for
> PIII from PGI is useless. Now I am wondering if compiling with
> -Mvect=sse or -Mvect=prefetch with pgf90 WITHOUT the SSE support enabled
> in the kernel is not the source of this unstability.

Actually, running SSE code (involving any SSE "mov" instructions) on a kernel
wich doesn't save the SSE registers between context switches would result in a
segmentation fault.....

I have learned this the hard way :

The original RH6.2 kernel (2.2.14-5.0) had PIII support and therefore saving of
SSE registers, but when RH released a kernel update because they experienced
data loss during context switches (RHBA-2000:013-01), I upgraded to
2.2.14-6.0.1. This kernel however did not have SSE support enabled, and my hand
coded SSE routines suddenly caused a segmentation fault.

There are however some SSE instructions that doesn't require a context switch
save of registers (i.e "sfence" and "prefetchnta")

> Anyway, 50 % of variation for a pure computation piece of code seems too
> large to be explained by the SSE support. SSE on PIII is single
> precision only, so it does not help to get more Flops. Maybe there is
> something else in the patch that they applied, I will look at it.
I agree.

 Steffen Persvold                        Systems Engineer
 Email  : mailto:sp at            Scali AS (
 Norway : Tel  : (+47) 2262 8950         Olaf Helsets vei 6
          Fax  : (+47) 2262 8951         N-0621 Oslo, Norway

 USA    : Tel  : (+1) 713 706 0544       10500 Richmond Avenue, Suite 190
                                         Houston, Texas 77042, USA

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list