[英]CUDA atomicAdd() with long long int
Any time I try to use atomicAdd
with anything other than (*int, int)
I get this error: atomicAdd
我尝试使用atomicAdd
的任何东西(*int, int)
我都会收到此错误:
error: no instance of overloaded function "atomicAdd" matches the argument list
But I need to use a larger data type than int
. 但我需要使用比int
更大的数据类型。 Is there any workaround here? 这里有解决方法吗?
Device Query: 设备查询:
/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 680"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 4095 MBytes (4294246400 bytes)
( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores
GPU Clock rate: 1084 MHz (1.08 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GTX 680
My guess is wrong compile flags. 我的猜测是错误的编译标志。 You're looking for anything other than int, you should be using sm_12 or higher. 你正在寻找除int以外的任何东西,你应该使用sm_12或更高版本。
As stated by Robert Crovella the unsigned long long int
variable is supported, but the long long int
is not. 如Robert Crovella所述,支持unsigned long long int
变量,但long long int
不支持。
Used the code from: Beginner CUDA - Simple var increment not working 使用以下代码: 初学者CUDA - 简单的var增量不起作用
#include <iostream>
using namespace std;
__global__ void inc(unsigned long long int *foo) {
atomicAdd(foo, 1);
}
int main() {
unsigned long long int count = 0, *cuda_count;
cudaMalloc((void**)&cuda_count, sizeof(unsigned long long int));
cudaMemcpy(cuda_count, &count, sizeof(unsigned long long int), cudaMemcpyHostToDevice);
cout << "count: " << count << '\n';
inc <<< 100, 25 >>> (cuda_count);
cudaMemcpy(&count, cuda_count, sizeof(unsigned long long int), cudaMemcpyDeviceToHost);
cudaFree(cuda_count);
cout << "count: " << count << '\n';
return 0;
}
Compiled from Linux: nvcc -gencode arch=compute_12,code=sm_12 -o add add.cu
从Linux编译: nvcc -gencode arch=compute_12,code=sm_12 -o add add.cu
Result: 结果:
count: 0
count: 2500
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.