[Beowulf] crunch per kilowatt: GPU vs. CPU
Craig.Tierney at noaa.gov
Mon May 18 16:24:37 EDT 2009
Bill Broadley wrote:
> Craig Tierney wrote:
>> Where did you get the 1/12th number for NVIDIA? For each streaming multiprocessor (SM)
>> has 1 single precision FPU per thread (8 threads per SM), but only 1 double precision FPU
>> on the SM. So that ratio would be 1/8.
> I just used the nvidia provided information:
> Click on specifications. 933/78=11.96
>> I have demonstrated this ratio on a simple
>> code that required little to no memory transfers.
> Maybe the ratios are different when the workload isn't optimal for getting the
> peak rate. Peak numbers often require very special situations, something like
> interleaved adds and multiplies or a fused instruction that does 2 flops. So
> maybe for pure multiplications you get 1/8th, but for the perfect workload you
> get 1/12th.
Go figure, it does say that. I dug in and the explanation I found is that
the single precision number is based on 3 operations per cycle. Two come
from the FP unit (MADD operation) and the third comes from the "special function"
unit. The special function unit is for transcendental function units, and several
If you are smart (or lucky) you could get the 1/12 ratio. If based on just MADD
it is 1/8.
Craig Tierney (craig.tierney at noaa.gov)
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf