CUDA nvprof number of floating point ops

Question

I am curious to why the number of flops reported by the profiler is not equal to the sum of ADDs, MULs and FMAs?

Invocations                     Metric Name              Metric Description         Min         Max         Avg
Device "GeForce GTX 780 Ti (0)"
    Kernel: mul_mm(double const *, double*, int, int, int)
         30                        flops_dp                   FLOPS(Double)   159500000   159500000   159500000
         30                    flops_dp_add               FLOPS(Double Add)           0           0           0
         30                    flops_dp_mul               FLOPS(Double Mul)    17000000    17000000    17000000
         30                    flops_dp_fma               FLOPS(Double FMA)    71250000    71250000    71250000

I get 159500000 - 17000000 - 71250000 = 71250000 . Is this just accidental or are FMAs counted twice?

Answer 1

The flops metrics count the number of operations not instructions executed. FMA and DFMA count as 2 operations. The profiler's definition of flops is inconsistent given it counts a FMA as 2 for one counter and 1 for another.

Peak FLOPs is calculated as GpuClockFrequency * CudaCoresPerSm * SmCount * 2 ops/FMA.

CUDA nvprof number of floating point ops

Question

1 answers

solution1
2 2014-05-02 02:30:13

CUDA nvprof number of floating point ops

Question

1 answers

solution1 2 2014-05-02 02:30:13

solution1
2 2014-05-02 02:30:13