Performance Variations using MPI/Myrico

Steffen Persvold sp at
Fri Apr 27 10:42:55 EDT 2001

Patrick Geoffray wrote:
> > 3) Any ideas on what could cause this much variation?
> I have some ideas, but nothing I would bet on. Mainly cache trashing : the
> memory copy operation is improved with SSE by using the prefecthing
> support, and this prefetch bypass the L2 cache. Without SSE, the L2 cache
> is happilly flushed as a processor is doing a copy. As the FFT code
> include a copy step, who knows... :-)

Hmm, the NAS application runs in userspace and since this inner loop (FFT code) runs without any communication with
other nodes, why would a SSE patched kernel improve it's memcpy performance. I would believe that the memcpy calls in
the FFT code was either inlined by the compiler, or that a call to libc's memcpy was made. It shouldn't involve any
system (kernel) time at all, right ?? 

 Steffen Persvold                        Systems Engineer
 Email  : mailto:sp at            Scali AS (
 Norway : Tlf  : (+47) 2262 8950         Olaf Helsets vei 6
          Fax  : (+47) 2262 8951         N-0621 Oslo, Norway

 USA    : Tlf  : (+1) 713 706 0544       10500 Richmond Avenue, Suite 190
                                         Houston, Texas 77042, USA

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list