what is a flop

Daniel Kidger
Wed Jun 18 13:54:10 EDT 2003

From: Mikhail Kuzminsky
>To: roberto.ammendola at roma2.infn.it
>Cc: beowulf at beowulf.org
>Subject: Re: what is a flop
>According to Roberto Ammendola
>> The "Floating point operations per clock cycle" depends on the 
>> processor, obviously, and on which instructions you use in your code. 
>> For example in a processor with the SSE instruction set you can perform 
>> 4 operations (on 32 bit register each) per clock cycle. One processor 
>> (Xeon or P4) running at 

>  Taking into account that throughput of FMUL and FADD units in
>P4/Xeon is 2 cycles, i.e. FP result may be received on any 2nd sycle
>only, the peak Performance of P4/2 Ghz must be 4 GFLOPS.

IMHO You are both correct and also wrong at the same time.

The P4/Xeon *can* do 8 Gflop/s but only in 'single-precision'. It can do
this by issueing just one SSE2 instruction but that instruction does 4 muls
(or adds) on a 128-bit load. (as 4*4byte consecutive floats).  compare with
doing 2 muls (or adds)  on 2*16 consecutive 8byte 'doubles'). 
A Flop is usually defines as being on a number of at least 64bits.

iirc the P4/Xeon can only issue one floating point instruction per cycle,
and so outside of the SSE2 unit it can only achive clockspeed Gflops/s.

Hence a 2.0 GHz P4/Xeon should be quoted as 4 GigaFlops peak

As a side note SSE2 only works to the standard 64 bits, but the FPUs work to
80bits, hence you often get slightly different numerical results when
comparing IEEE maths between the P4 FPU and the SSE2 or indeed between the
P4 and say an Alpha.


Dr. Dan Kidger, Quadrics Ltd.
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
