在内核运行之间清除CUDA共享内存的最简单方法

Question

I am trying to implement box filter in C-CUDA, starting with implementing matrix average problem in CUDA first. 我正在尝试在C-CUDA中实现盒式滤波器，首先要在CUDA中实现矩阵平均问题。 When I try following code without commenting those lines within for loops than I get the certain output. 当我尝试遵循代码而不在for循环中注释这些行时，我得到了一定的输出。 But when I comment those lines then it generates the same output again! 但是，当我注释这些行时，它将再次生成相同的输出！

if(tx==0)
        for(int i=1;i<=radius;i++)
        {
            //sharedTile[radius+ty][radius-i] = 6666.0;
        }

    if(tx==(Dx-1))
        for(int i=0;i<radius;i++)
        {
            //sharedTile[radius+ty][radius+Dx+i] = 7777;
        }

    if(ty==0)
        for(int i=1;i<=radius;i++)
        {
            //sharedTile[radius-i][radius+tx]= 8888;
        }

    if(ty==(Dy-1))
        for(int i=0;i<radius;i++)
        {
            //sharedTile[radius+Dy+i][radius+tx] = 9999;
        }

    if((tx==0)&&(ty==0))
        for(int i=globalRow,l=0;i<HostPaddedRow,l<radius;i++,l++)
        {
            for(int j=globalCol,m=0;j<HostPaddedCol,m<radius;j++,m++)
            {
                //sharedTile[l][m]=8866;
            }
        }

    if((tx==(Dx-1))&&(ty==(Dx-1)))
        for(int i=(HostPaddedRow+1),l=(radius+Dx);i<(HostPaddedRow+1+radius),l<(TILE+2*radius);i++,l++)
        {
            for(int j=HostPaddedCol,m=(radius+Dx);j<(HostPaddedCol+radius),m<(TILE+2*radius);j++,m++)
            {
                //sharedTile[l][m]=7799.0;
            }
        }

    if((tx==(Dx-1))&&(ty==0))
        for(int i=(globalRow),l=0;i<HostPaddedRow,l<radius;i++,l++)
        {
            for(int j=(HostPaddedCol+1),m=(radius+Dx);j<(HostPaddedCol+1+radius),m<(TILE+2*radius);j++,m++)
            {
                //sharedTile[l][m]=9966;
            }
        }

    if((tx==0)&&(ty==(Dy-1)))
        for(int i=(HostPaddedRow+1),l=(radius+Dy);i<(HostPaddedRow+1+radius),l<(TILE+2*radius);i++,l++)
        {
            for(int j=globalCol,m=0;j<HostPaddedCol,m<radius;j++,m++)
            {
                //sharedTile[l][m]=0.0;
            }
        }
    __syncthreads();

You can ignore those for loop conditions and all, they are irrelevant here right now. 您可以忽略那些循环条件，所有这些都与此时无关。 May basic problem and question is why am I getting the same vales even after commenting those lines? 可能的基本问题是，即使在评论了这些内容之后，为什么我仍然得到同样的价位？ I tried making some modification in my main program and kernel as well. 我也尝试在主程序和内核中进行一些修改。 Also entered manual errors and removed them, and again compiled and executed the same code, but still getting same values. 还输入了手动错误并将其删除，然后再次编译并执行相同的代码，但仍获得相同的值。 Is there any way to clear cache memory in CUDA? 有什么方法可以清除CUDA中的缓存？ I am using Nsight + RedHat + CUDA 5.5. 我正在使用Nsight + RedHat + CUDA 5.5。 Thanks in advance. 提前致谢。

Answer 1

why am I getting the same vales even after commenting those lines? 为什么我在评论了这些内容后仍得到相同的价位？

Seems like sharedTile is pointing to the same piece of memory between multiple consecutive runs which is absolutely normal. 似乎sharedTile指向多个连续运行之间的同一块内存，这是绝对正常的。 Therefore commented out code does not "generate" anything, it is just your pointer pointing to the same memory which was not flushed. 因此，注释掉的代码不会“生成”任何东西，只是您的指针指向未刷新的同一内存。

Is there any way to clear cache memory in CUDA 有什么方法可以清除CUDA中的缓存

I believe you are talking about clearing shared memory? 我相信您正在谈论清除共享内存？ If so then you can use analogy of approach described here . 如果是这样，则可以使用此处描述的方法的类比。 Instead of using cudaMemset in host code you'll be zeroing out your shared memory from inside of kernel. 不用在主机代码中使用cudaMemset ，而是将内核内部的共享内存清零。 The simplest approach is to place following code at the beginning of your kernel which declares sharedTile (this is for one dimensional thread blocks, one dimensional shared memory array): 最简单的方法是将以下代码放在声明sharedTile的内核sharedTile （这是针对一维线程块，一维共享内存数组的）：

__global__ void your_kernel(int count) {
    extern __shared__ float* sharedTile;
    for (int i = threadIdx.x; i < count; i += blockDim.x)
        sharedTile[i] = 0.0f;
    __syncthreads();
    // your code here
}

Following approaches do not guarantee clear shared memory as Robert Crovella pointed out in below comment: 罗伯特·克罗维拉（Robert Crovella）在以下评论中指出，以下方法不能保证清晰的共享内存：

Or possibly call nvidia-smi with --gpu-reset parameter. 或者可能使用--gpu-reset参数调用nvidia-smi 。
Yet another solution was offered in the other SO thread which includes driver unloading and reloading. 在另一个SO线程中提供了另一个解决方案，其中包括驱动程序卸载和重新加载。

在内核运行之间清除CUDA共享内存的最简单方法

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-08-22 12:42:21

在内核运行之间清除CUDA共享内存的最简单方法

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-08-22 12:42:21

解决方案1
2 已采纳 2014-08-22 12:42:21