I am curious to why the number of flops reported by the profiler is not equal to the sum of ADDs, MULs and FMAs?
Invocations Metric Name Metric Description Min Max Avg
Device "GeForce GTX 780 Ti (0)"
Kernel: mul_mm(double const *, double*, int, int, int)
30 flops_dp FLOPS(Double) 159500000 159500000 159500000
30 flops_dp_add FLOPS(Double Add) 0 0 0
30 flops_dp_mul FLOPS(Double Mul) 17000000 17000000 17000000
30 flops_dp_fma FLOPS(Double FMA) 71250000 71250000 71250000
I get 159500000 - 17000000 - 71250000 = 71250000
. Is this just accidental or are FMAs counted twice?
The flops metrics count the number of operations not instructions executed. FMA and DFMA count as 2 operations. The profiler's definition of flops is inconsistent given it counts a FMA as 2 for one counter and 1 for another.
Peak FLOPs is calculated as GpuClockFrequency * CudaCoresPerSm * SmCount * 2 ops/FMA.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.