Some issue with Atomic add in CUDA kernel operation

Question

I'm having a issue with my kernel.cu class

Calling nvcc -v kernel.cu -o kernel.o I'm getting this error:

kernel.cu(17): error: identifier "atomicAdd" is undefined

My code:

#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd

__global__ void dot (int *a, int *b, int *c){
    __shared__ int temp[THREADS_PER_BLOCK];
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        int sum = 0;
        for( int i = 0; i<THREADS_PER_BLOCK; i++)
            sum += temp[i];
        atomicAdd(c, sum);
    }
}

Some suggest?

Answer 1

You need to specify an architecture to nvcc which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:

nvcc -arch=sm_11 -v kernel.cu -o kernel.o

and see what happens.

EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.

Answer 2

Today with the latest cuda SDK and toolkit this solution will not work. People also say that adding:

compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.

You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as:

compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

Some issue with Atomic add in CUDA kernel operation

Question

2 answers

solution1
14 ACCPTED 2011-05-13 16:26:58

solution2
1 2012-05-13 16:04:10

Some issue with Atomic add in CUDA kernel operation

Question

2 answers

solution1 14 ACCPTED 2011-05-13 16:26:58

solution2 1 2012-05-13 16:04:10

solution1
14 ACCPTED 2011-05-13 16:26:58

solution2
1 2012-05-13 16:04:10