简体   繁体   English

CUDA查找数组元素的总和

[英]CUDA find sum of elements of the array

hello I want to find the sum of array elements using CUDA. 您好,我想使用CUDA查找数组元素的总和。

__global__ void countZeros(int *d_A, int * B)
{
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    B[0] = B[0]+d_A[index];
}

so in the end, B[0] supposed to contain the sum of all elements. 因此最后,B [0]应该包含所有元素的总和。 but I noticed that B[0] equals to zero every time. 但我注意到B [0]每次都等于零。 so in the end it contains only last element. 因此最后它仅包含最后一个元素。 why B[0] becomes zero every time? 为什么B [0]每次都变为零?

All of the threads are writing to B[0] , and some may be attempting to write simultaneously. 所有线程都正在写入B[0] ,有些可能正在尝试同时写入。 This line of code: 这行代码:

B[0] = B[0]+d_A[index];

requires a read and a write of B[0] . 需要B[0]的读写。 If multiple threads are doing this at the same time, you will get strange results. 如果多个线程同时执行此操作,则会得到奇怪的结果。

You can make a simple fix by doing this: 您可以通过以下方法进行简单的修复:

atomicAdd(B, d_A[index]);

and you should get sensible results (assuming you have no errors elsewhere in your code, that you haven't shown.) Be sure to initialize B[0] to some known value before calling this kernel. 并且应该得到合理的结果(假设您的代码中其他地方没有错误,没有显示。)在调用此内核之前,请确保将B[0]初始化为某个已知值。

If you want to do this efficiently, however, you should study the cuda reduction sample or just use CUB . 但是,如果您想高效地执行此操作,则应该研究cuda 减少样本或仅使用CUB

And be sure to use proper cuda error checking any time you are having trouble with a CUDA code. 并且,当您遇到CUDA代码问题时,请务必使用正确的cuda错误检查

So, if you still can't get sensible results, please instrument your code with proper cuda error checking before asking "I made this change but it still doesn't work, why?" 因此,如果您仍然无法获得合理的结果, 在询问“我进行了此更改但它仍然不起作用,为什么?”之前,通过适当的cuda错误检查对代码进行检测。 I can't tell you why, because this is the only snippet of code that you've shown. 我无法告诉您原因,因为这是您显示的唯一代码片段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM