简体   繁体   中英

CUDA nvprof number of floating point ops

I am curious to why the number of flops reported by the profiler is not equal to the sum of ADDs, MULs and FMAs?

Invocations                     Metric Name              Metric Description         Min         Max         Avg
Device "GeForce GTX 780 Ti (0)"
    Kernel: mul_mm(double const *, double*, int, int, int)
         30                        flops_dp                   FLOPS(Double)   159500000   159500000   159500000
         30                    flops_dp_add               FLOPS(Double Add)           0           0           0
         30                    flops_dp_mul               FLOPS(Double Mul)    17000000    17000000    17000000
         30                    flops_dp_fma               FLOPS(Double FMA)    71250000    71250000    71250000

I get 159500000 - 17000000 - 71250000 = 71250000 . Is this just accidental or are FMAs counted twice?

The flops metrics count the number of operations not instructions executed. FMA and DFMA count as 2 operations. The profiler's definition of flops is inconsistent given it counts a FMA as 2 for one counter and 1 for another.

Peak FLOPs is calculated as GpuClockFrequency * CudaCoresPerSm * SmCount * 2 ops/FMA.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM