[英]CUDA atomicAdd across blocks
I cannot get the atomicAdd
function to work over all blocks. 我无法使用
atomicAdd
函数来处理所有块。 It turns out that the following kernel code gives me the total number of threads in a block ( < 5000
for example): 事实证明,以下内核代码为我提供了一个块中的线程总数(例如
< 5000
):
__global __ void kernelCode(float *result)
{
int index = threadIdx.x+blockIdx.x*blockDim.x;
if (index < 5000)
{
atomicAdd(result, 1.0f);
}
}
Can you please tell me how to add something to a value but without allocating the whole array of 1.0f
? 你能告诉我如何在没有分配整个
1.0f
数组的情况下添加一些值吗? This is because I'm using this code on a system with very limited resources - every bit counts. 这是因为我在资源非常有限的系统上使用此代码 - 每一位都很重要。
This code can work across multiple blocks without allocating an array of 1.0f
. 此代码可以跨多个块工作,而无需分配
1.0f
的数组。 The if (index < 5000)
statement is not intended to limit you to a single threadblock. if (index < 5000)
语句不是为了将您限制为单个线程块。 It is intended to make sure that only legitimate threads in the entire grid take part in the operation. 它旨在确保只有整个网格中的合法线程参与操作。
try something like this: 尝试这样的事情:
#include <iostream>
#define TOTAL_SIZE 100000
#define nTPB 256
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
__global__ void kernelCode(float *result)
{
int index = threadIdx.x+blockIdx.x*blockDim.x;
if (index < TOTAL_SIZE)
{
atomicAdd(result, 1.0f);
}
}
int main(){
float h_result, *d_result;
cudaMalloc((void **)&d_result, sizeof(float));
cudaCheckErrors("cuda malloc fail");
h_result = 0.0f;
cudaMemcpy(d_result, &h_result, sizeof(float), cudaMemcpyHostToDevice);
cudaCheckErrors("cudaMemcpy 1 fail");
kernelCode<<<(TOTAL_SIZE+nTPB-1)/nTPB, nTPB>>>(d_result);
cudaDeviceSynchronize();
cudaCheckErrors("kernel fail");
cudaMemcpy(&h_result, d_result, sizeof(float), cudaMemcpyDeviceToHost);
cudaCheckErrors("cudaMemcpy 2 fail");
std::cout<< "result = " << h_result << std::endl;
return 0;
}
You can change TOTAL_SIZE
to any number that will conveniently fit in a float
您可以将
TOTAL_SIZE
更改为任何可以方便地放入float
Note that I typed this code in the browser, there may be typographical errors. 请注意,我在浏览器中键入此代码,可能会出现打字错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.