简体   繁体   English

计算CUDA内核中的循环数

[英]Count the number of cycles in a CUDA kernel

How can I count the number of cycles performed by a function like the following. 如何计算类似以下功能的周期数。 Should I count straight forward the number of sums and muls and divs? 我应该直接算出总和,muls和div的数量吗? Where can I check how many cycles an addition takes in CUDA? 在哪里可以查看CUDA中的添加项需要多少个周期?

__global__
void mandelbrotSet_per_element(Grayscale *image){
    float minR = -2.0f, maxR = 1.0f;
    float minI = -1.2f, maxI = minI + (maxR-minR) * c_rows / c_cols;
    float realFactor = (maxR - minR) / (c_cols-1);
    float imagFactor = (maxI - minI) / (c_rows-1);

    bool isInSet;
    float c_real, c_imag, z_real, z_imag;

    int y = blockDim.y * blockIdx.y + threadIdx.y;
    int x = blockDim.x * blockIdx.x + threadIdx.x;

    while (y < c_rows){
        while (x < c_cols) {
            c_real = minR + x * realFactor;
            c_imag = maxI - y * imagFactor;
            z_real = c_real;    z_imag = c_imag;
            isInSet = true;

            for (int k = 0; k < c_iterations; k++){
                float z_real2 = z_real * z_real;
                float z_imag2 = z_imag * z_imag;
                if (z_real2 + z_imag2 > 4){
                    isInSet = false;
                    break;
                }
                z_imag = 2 * z_real * z_imag + c_imag;
                z_real = z_real2 - z_imag2 + c_real;
            }
            if (isInSet)    image[y*c_cols+x] = 255;
            else            image[y*c_cols+x] = 0;

            x += blockDim.x * gridDim.x;
        }
        x = blockDim.x * blockIdx.x + threadIdx.x;
        y += blockDim.y * gridDim.y;
    }
}

Instruction throughput is described in the programming guide here 此处的编程指南中描述了指令吞吐量

You can also try measuring a sequence of instructions using the native clock() function described here 您也可以尝试使用此处介绍的native clock()函数来测量指令序列

The compiler tends to obscure actual counts of operations at the source code level (increasing or possibly decreasing apparent arithmetic intensity) so if you want to indentify exactly what the machine is doing you may want to inspect the ptx (nvcc -ptx ...) or possibly the machine assembly level code, called SASS, which you can extract from an executable using the cuobjdump utility. 编译器倾向于掩盖源代码级别的实际操作计数(增加或可能减少表观算术强度),因此,如果您想确切地确定机器在做什么,则可能要检查ptx(nvcc -ptx ...)或可能是称为SASS的机器装配级代码,您可以使用cuobjdump实用程序从可执行文件中提取该cuobjdump

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM