Cuda Kernel无法启动

Question

Here is my code. 这是我的代码。 I have an array of (x,y) pairs. 我有一个（x，y）对数组。 I want to calculate for each co-ordinate the farthest point. 我想为每个坐标计算最远的点。

#define GPUERRCHK(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, char *file, int line, bool abort=true)
{
   if (code != cudaSuccess)
   {
      fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
      if (abort) exit(code);
   }
}

__device__ float computeDist( float x1, float y1, float x2, float y2 )
{
    float delx = x2 - x1;
    float dely = y2 - y1;
    return sqrt( delx*delx + dely*dely );
}

__global__ void kernel( float * x, float * y, float * dev_dist_sum, int N )
{
    int tid = blockIdx.x*gridDim.x + threadIdx.x;
    float a = x[tid];  //............(alpha)
    float b = y[tid];  //............(beta)
    if( tid < N )
    {
    float maxDist = -1;
    for( int k=0 ; k<N ; k++ )
    {
        //float dist = computeDist( x[tid], y[tid], x[k], y[k] ); //....(gamma)
        float dist = computeDist( a, b, x[k], y[k] );             //....(delta)
        if( dist > maxDist )
        maxDist = dist; 
    }
    dev_dist_sum[tid] = maxDist;
    }
}

int main()
{
.
.

    kernel<<<(N+31)/32,32>>>( dev_x, dev_y, dev_dist_sum, N );
    GPUERRCHK( cudaPeekAtLastError() );
    GPUERRCHK( cudaDeviceSynchronize() );

.
.

}

I have a NVidia GeForce 420M. 我有NVidia GeForce 420M。 I have verified that cuda works with it on my computer. 我已验证cuda可以在我的计算机上使用。 When I run the above mentioned code for N = 50000, the kernel fails to launch throwing out the error message "unspecified error message". 当我为N = 50000运行上述代码时，内核无法启动，抛出错误消息“ unspecified error message”。 However it seems to work fine for a smaller value like 10000. 但是，对于较小的值（例如10000），它似乎可以正常工作。

Also, if I comment out alpha, beta, delta (see marking in the code) and uncomment gamma, the code works even for a large value of N like 50000 or 100000. 另外，如果我注释掉alpha，beta，delta（请参阅代码中的标记）和取消注释gamma，则该代码即使对于较大的N（如50000或100000）也可以工作。

I want to use alpha and beta so as to reduce memory traffic by use of thread memory more instead of global memory. 我想使用alpha和beta，以便通过更多使用线程内存而不是全局内存来减少内存流量。

How do I sort this issue? 如何解决此问题？

Answer 1

@mkuse. @mkuse。 gridDim can be visualized as a 2-D spatial arrangement of thread blocks in a grid and blockDim is a 3-D spatial arrangements of threads. gridDim可以可视化为网格中线程块的2D空间排列，而blockDim是线程的3D空间排列。 For instance, dim3 gridDim(2,3,1) means 2 thread blocks in the x direction and 3 thread blocks in the y direction. 例如，dim3 gridDim（2,3,1）表示x方向上的2个线程块和y方向上的3个线程块。 The maximum you can go is 65536 = 2^16. 您可以使用的最大值是65536 = 2 ^ 16。 dim3 blockDim(32,16,1) is at the thread granularity. dim3 blockDim（32,16,1）为线程粒度。 32 threads in the x direction and 16 threads in the y direction making up for 512 threads in total. x方向上有32个线程，y方向上有16个线程，总共512个线程。 You can access each thread with a thread id. 您可以使用线程ID访问每个线程。 However since you have multiple blocks, you would have to identify threads with the respective blockdims and griddims. 但是，由于有多个块，因此必须使用各自的blockdims和griddims来标识线程。

Cuda Kernel无法启动

问题描述

1 个解决方案

解决方案1
1 2013-02-02 21:12:26

Cuda Kernel无法启动

问题描述

1 个解决方案

解决方案1 1 2013-02-02 21:12:26

解决方案1
1 2013-02-02 21:12:26