CUDA内核仅适用于一维线程索引

Question

There is a weird problem. 有一个奇怪的问题。 I have following code. 我有以下代码。 When I call first function it does not give correct result. 当我调用第一个函数时，它不会给出正确的结果。 However, when I call the function2 (the second function) it works fine. 但是，当我调用function2（第二个函数）时，它工作正常。 It is so weird to me. 对我来说太奇怪了。 Does anyone has any idea about the problem? 有人对这个问题有任何想法吗？ Thanks!!! 谢谢！！！

__global__ void function(int w, class<double> C, float *result) {  

    int r = threadIdx.x + blockIdx.x * blockDim.x;  
    int c = threadIdx.y + blockIdx.y * blockDim.y;  
    int half_w = w /2;  

    if (r < w && c < w) {  
        double dis = sort((double)(r - half_w) * (r - half_w) + (double)(c_half_w) * (c - half_w));  
    result[c * w + r] = (float)C.getVal(dis);  
    }  
}


__global__ void function2(int w, class<double> C, float *result) {  

    int tid = threadIdx.x + blockIdx.x * blockDim.x;  

    int half_w = w /2;
    int r = tid / w;  
    int c = tid % w;    

    if (r < w && c < w) {  
        double dis = sort((double)(r - half_w) * (r - half_w) + (double)(c_half_w) * (c - half_w));  
    result[c * w + r] = (float)C.getVal(dis);  
    }  
}

UPDATE: I use the function and function2 to draw an image. 更新：我使用function和function2绘制图像。 The pixel value is based on the distance between image center and current pixel position. 像素值基于图像中心和当前像素位置之间的距离。 Based on the distance, the class C getVal will calculate the value for the pixel. 基于距离，类C getVal将计算像素的值。 So, in the kernel, I just make every thread to calculate the distance and corresponding pixel value. 因此，在内核中，我只是使每个线程都可以计算距离和相应的像素值。 The correct result is compared with CPU version. 将正确的结果与CPU版本进行比较。 The function is just give some random value some very larger some very small. function只是给一些随机值一些很大一些一些很小。 When I changed the result[c * w + r] = (float)C.getVal(dis) to result[c * w +r ] = 1.0f , the generated image seems does not change. 当我将result[c * w + r] = (float)C.getVal(dis)更改为result[c * w +r ] = 1.0f ，生成的图像似乎没有改变。

The image size is W x W, to launch function I set dim3 grid_dim(w / 64 + 1, w / 64 + 1); 图像尺寸为W x W，要启动function我设置了dim3 grid_dim(w / 64 + 1, w / 64 + 1); dim3 block_dim(64, 64); function<<<grid_dim, block_dim>>>(W, C, cu_img);

To launch function2 function2<<<W / 128 + 1, 128>>>(W, C, cu_img) 启动function2 function2<<<W / 128 + 1, 128>>>(W, C, cu_img)

Fixed: 固定：

I got the problem. 我有问题。 I assigned too many threads to one block. 我将太多线程分配给一个块。 The max threads in one block is 1024 in my device. 我的设备中一个块中的最大线程数为1024。 Actually, when I run cuds-memcheck, I can see the function2 does not even launched. 实际上，当我运行cuds-memcheck时，我可以看到function2甚至没有启动。

Answer 1

I solved the problem. 我解决了问题。 I assigned too many threads to one block. 我将太多线程分配给一个块。 The max threads in one block is 1024 in my device. 我的设备中一个块中的最大线程数为1024。 Actually, when I ran cuda-memcheck, I can see the function2 was not ever launched. 实际上，当我运行cuda-memcheck时，我可以看到function2从未启动过。

CUDA内核仅适用于一维线程索引

问题描述

1 个解决方案

解决方案1
1 2013-01-19 17:06:39

CUDA内核仅适用于一维线程索引

问题描述

1 个解决方案

解决方案1 1 2013-01-19 17:06:39

解决方案1
1 2013-01-19 17:06:39