what is a flop
rbw at ahpcrc.org
Tue Jun 17 17:40:25 EDT 2003
Dan Kidger wrote:
>The P4 can normally do one flop per cycle - but it can do two (using its
>SSE2 unit) but only if both are multiplies (or both adds) *and* both pairs
>of operands are contiguous in memory.
>Some architectures have two seperate floating point units: one just adds,
>the other multiplies (can add too) and hence also have a '2' here
>Some architectures most notably the vector machines have 16 or more floating
>Many architectures have fused mutiply-add units (Alpha, Itanium, Mips,
>Power3, etc.) They can add the result of a multiply directly to another
>register. Thus these also have '2' flops per cycle.
>Some archtectures have 2 (or more) 'muladd' units. Hence iirc Power3/4 and
>Itanium2 can yield 4 flops per cycle.
A glass ;-) menagerie ... if you will.
Another point to be made of the this, is that peak performance per processor
is slippery to define ... where does the boundary of the processor actually
rest? At the edge of the chip, the module, ... a safe definition is perhaps
a processing core consisting of a pairing of independent FPUs and dedicated
registers. Cray quotes 12.8 Gflops for their X1 MSP ... but that is for
a multi-chip module ... which includes 2 FPUs (mul and add) x 2 vector pipes x
4 vector cores ... this is gives you Daniel's 16 FPUs.
As an exercise compute the vector clock of the X1, the peak performance of a
Power 4 core running at 1.3 GHz? How about the whole Power 4 chip? ;-)
Finally, the relevance of the peak performance ends right about ... here.
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf