[英]CUDA find sum of elements of the array
hello I want to find the sum of array elements using CUDA. 您好,我想使用CUDA查找数组元素的总和。
__global__ void countZeros(int *d_A, int * B)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
B[0] = B[0]+d_A[index];
}
so in the end, B[0] supposed to contain the sum of all elements. 因此最后,B [0]应该包含所有元素的总和。 but I noticed that B[0] equals to zero every time.
但我注意到B [0]每次都等于零。 so in the end it contains only last element.
因此最后它仅包含最后一个元素。 why B[0] becomes zero every time?
为什么B [0]每次都变为零?
All of the threads are writing to B[0]
, and some may be attempting to write simultaneously. 所有线程都正在写入
B[0]
,有些可能正在尝试同时写入。 This line of code: 这行代码:
B[0] = B[0]+d_A[index];
requires a read and a write of B[0]
. 需要
B[0]
的读写。 If multiple threads are doing this at the same time, you will get strange results. 如果多个线程同时执行此操作,则会得到奇怪的结果。
You can make a simple fix by doing this: 您可以通过以下方法进行简单的修复:
atomicAdd(B, d_A[index]);
and you should get sensible results (assuming you have no errors elsewhere in your code, that you haven't shown.) Be sure to initialize B[0]
to some known value before calling this kernel. 并且应该得到合理的结果(假设您的代码中其他地方没有错误,没有显示。)在调用此内核之前,请确保将
B[0]
初始化为某个已知值。
If you want to do this efficiently, however, you should study the cuda reduction sample or just use CUB . 但是,如果您想高效地执行此操作,则应该研究cuda 减少样本或仅使用CUB 。
And be sure to use proper cuda error checking any time you are having trouble with a CUDA code. 并且,当您遇到CUDA代码问题时,请务必使用正确的cuda错误检查 。
So, if you still can't get sensible results, please instrument your code with proper cuda error checking before asking "I made this change but it still doesn't work, why?" 因此,如果您仍然无法获得合理的结果, 请在询问“我进行了此更改但它仍然不起作用,为什么?”之前,通过适当的cuda错误检查对代码进行检测。 I can't tell you why, because this is the only snippet of code that you've shown.
我无法告诉您原因,因为这是您显示的唯一代码片段。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.