CUDA共享内存原子错误

Question

I am using a Tesla C1060 with 1.3 compute capability and nvcc compiler driver 4.0. 我正在使用具有1.3计算能力和nvcc编译器驱动程序4.0的Tesla C1060。 I am trying to do some computation local to thread block. 我正在尝试对线程块进行一些本地计算。 Each thread block is provided with a shared array which is first initialized to zero values. 每个线程块都提供有一个共享数组，该共享数组首先初始化为零值。 For synchronizing concurrent updates (addition) to shared data by threads of the thread block, I use CUDA atomicAdd primitive. 为了通过线程块的线程同步并发更新（添加）到共享数据，我使用CUDA atomicAdd原语。

Once each thread block is ready with the results in its shared data array, each entry in shared data array is iteratively merged (using atomicAdd) to corresponding entries in global data array. 一旦每个线程块准备好其共享数据数组中的结果，就可以将共享数据数组中的每个条目迭代合并（使用atomicAdd）到全局数据数组中的相应条目。

Below is a code very similar to what I basically trying to do. 下面的代码与我基本上想做的非常相似。

#define DATA_SZ 16
typedef unsigned long long int ULLInt;

__global__ void kernel( ULLInt* data, ULLInt ThreadCount )
{
  ULLInt thid = threadIdx.x + blockIdx.x * blockDim.x;
  __shared__ ULLInt sharedData[DATA_SZ];

  // Initialize the shared data
  if( threadIdx.x == 0 )
  {
    for( int i = 0; i < DATA_SZ; i++ ) { sharedData[i] = 0; }
  }
  __syncthreads();

  //..some code here

  if( thid < ThreadCount )
  {
    //..some code here

    atomicAdd( &sharedData[getIndex(thid), thid );

    //..some code here        

    for(..a loop...)
    { 
      //..some code here

      if(thid % 2 == 0)
      {           
        // getIndex() returns a value in [0, DATA_SZ )
        atomicAdd( &sharedData[getIndex(thid)], thid * thid );
      }
    }
  }
  __syncthreads();

  if( threadIdx.x == 0 )
  {
    // ...
    for( int i = 0; i < DATA_SZ; i++ ) { atomicAdd( &Data[i], sharedData[i] ); }
    //...
  }
}

If I compile with -arch=sm_20 I dont get any errors. 如果我使用-arch = sm_20进行编译，则不会得到任何错误。 However when I compile the kernel using -arch=sm_13 option I get the following errors: 但是，当我使用-arch = sm_13选项编译内核时，出现以下错误：

ptxas /tmp/tmpxft_00004dcf_00000000-2_mycode.ptx, line error   : Global state space expected for instruction 'atom'
ptxas /tmp/tmpxft_00004dcf_00000000-2_mycode.ptx, line error   : Global state space expected for instruction 'atom'
ptxas fatal   : Ptx assembly aborted due to errors

If I comment the following two lines I dont get any errors with -arch=sm_13: 如果我评论以下两行，则-arch = sm_13不会出现任何错误：

atomicAdd( &sharedData[getIndex(thid), thid );
atomicAdd( &sharedData[getIndex(thid)], thid * thid );

Can someone suggest what I might be doing wrong ? 有人可以建议我做错了什么吗？

Answer 1

Found the solution in CUDA C programming guide: Atomic functions operating on shared memory and atomic functions operating on 64-bit words are only available for devices of compute capability 1.2 and above. 在CUDA C编程指南中找到了解决方案：共享存储器上运行的原子功能和64位字上运行的原子功能仅适用于计算能力为1.2及更高版本的设备。 Atomic functions operating on 64-bit words in shared memory are only available for devices of compute capability 2.x and higher. 对共享内存中的64位字进行操作的原子功能仅适用于具有2.x和更高计算能力的设备。

So basically I cannot use ULLInt fro shared memory here and somehow I need to use unsigned int 所以基本上我不能在这里从共享内存中使用ULLInt，而我需要以某种方式使用unsigned int

CUDA共享内存原子错误

问题描述

1 个解决方案

解决方案1
1 已采纳

CUDA共享内存原子错误

问题描述

1 个解决方案

解决方案1 1 已采纳

解决方案1
1 已采纳