简体   繁体   中英

CUDA find sum of elements of the array

hello I want to find the sum of array elements using CUDA.

__global__ void countZeros(int *d_A, int * B)
{
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    B[0] = B[0]+d_A[index];
}

so in the end, B[0] supposed to contain the sum of all elements. but I noticed that B[0] equals to zero every time. so in the end it contains only last element. why B[0] becomes zero every time?

All of the threads are writing to B[0] , and some may be attempting to write simultaneously. This line of code:

B[0] = B[0]+d_A[index];

requires a read and a write of B[0] . If multiple threads are doing this at the same time, you will get strange results.

You can make a simple fix by doing this:

atomicAdd(B, d_A[index]);

and you should get sensible results (assuming you have no errors elsewhere in your code, that you haven't shown.) Be sure to initialize B[0] to some known value before calling this kernel.

If you want to do this efficiently, however, you should study the cuda reduction sample or just use CUB .

And be sure to use proper cuda error checking any time you are having trouble with a CUDA code.

So, if you still can't get sensible results, please instrument your code with proper cuda error checking before asking "I made this change but it still doesn't work, why?" I can't tell you why, because this is the only snippet of code that you've shown.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM