[Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?

Mikhail Kuzminsky kus at free.net
Sat Oct 13 10:57:05 EDT 2007


In message from Mark Hahn <hahn at mcmaster.ca> (Fri, 12 Oct 2007 
16:09:05 -0400 (EDT)):
>> This means that 2 additional FP results per cycle in 
>>microarchitecture gives 
>> only about 7% of performance increase :-(
>
>the 4 flops/cycle is really for linpack-like code: it assumes you are 
>executing packed double SIMD.

Yes, but AFAIK most of the modern optimizing F9x compilers for x86 can 
generate codes w/SSEx instructions (instead of x87). And I assume that 
many real world codes, including some from SPECfp2006 set, includes 
the work w/floating point vectors. It's not necessary to have very 
long vectors - taking into account that 64 bit SSE vectors have 
length=2.
Such things may gives theoretically 2x speedup !  

>just that not all FP is SIMD-friendly, I think.
Yes, I agree w/"not all". But 7% speedup means, I beleive, "very 
seldom FP codes" ?

Yours
Mikhail

>  if your code spends 
>a lot of time in blas/lapack functions, I would expect it to see good 
>speedup.
>
>regards, mark hahn.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

!DSPAM:4710de38150501523621093!



More information about the Beowulf mailing list