简体   繁体   English

CUDA atomicAdd用于双精度定义错误

[英]CUDA atomicAdd for doubles definition error

In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here . 在以前的CUDA版本中,atomicAdd没有实现双精度,因此通常像这里一样实现它。 With the new CUDA 8 RC, I run into troubles when I try to compile my code which includes such a function. 使用新的CUDA 8 RC,当我尝试编译包含这样一个函数的代码时,我遇到了麻烦。 I guess this is due to the fact that with Pascal and Compute Capability 6.0, a native double version of atomicAdd has been added, but somehow that is not properly ignored for previous Compute Capabilities. 我想这是因为使用Pascal和Compute Capability 6.0,添加了原生双重版本的atomicAdd,但不知何故,以前的Compute Capabilities没有正确忽略。

The code below used to compile and run fine with previous CUDA versions, but now I get this compilation error: 下面的代码用于编译和运行以前的CUDA版本,但现在我得到此编译错误:

test.cu(3): error: function "atomicAdd(double *, double)" has already been defined

But if I remove my implementation, I instead get this error: 但是,如果我删除我的实现,我会得到此错误:

test.cu(33): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, double)

I should add that I only see this if I compile with -arch=sm_35 or similar. 我应该补充一点,如果我用-arch=sm_35或类似的方法进行编译,我只会看到这个。 If I compile with -arch=sm_60 I get the expected behavior, ie only the first error, and successful compilation in the second case. 如果我使用-arch=sm_60编译,我得到预期的行为,即只有第一个错误,并在第二种情况下成功编译。

Edit: Also, it is specific for atomicAdd -- if I change the name, it works well. 编辑:此外,它特定于atomicAdd - 如果我更改名称,它运作良好。

It really looks like a compiler bug. 它看起来像编译器错误。 Can someone else confirm that this is the case? 其他人可以确认是这种情况吗?

Example code: 示例代码:

__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull = (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
        old = atomicCAS(address_as_ull, assumed,
                __double_as_longlong(val + __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

__global__ void kernel(double *a)
{
    double b=1.3;
    atomicAdd(a,b);
}

int main(int argc, char **argv)
{
    double *a;
    cudaMalloc(&a,sizeof(double));

    kernel<<<1,1>>>(a);

    cudaFree(a);
    return 0;
}

Edit: I got an answer from Nvidia who recognize this problem, and here is what the developers say about it: 编辑:我从Nvidia那里得到了一个认识到这个问题的答案,以下是开发人员对此的评价:

The sm_60 architecture, that is newly supported in CUDA 8.0, has native fp64 atomicAdd function. 在CUDA 8.0中新支持的sm_60体系结构具有本机fp64 atomicAdd函数。 Because of the limitations of our toolchain and CUDA language, the declaration of this function needs to be present even when the code is not being specifically compiled for sm_60. 由于我们的工具链和CUDA语言的限制,即使没有为sm_60专门编译代码,也需要声明此函数的声明。 This causes a problem in your code because you also define a fp64 atomicAdd function. 这会导致代码出现问题,因为您还定义了fp64 atomicAdd函数。

CUDA builtin functions such as atomicAdd are implementation-defined and can be changed between CUDA releases. CUDA内置函数(例如atomicAdd)是实现定义的,可以在CUDA版本之间进行更改。 Users should not define functions with the same names as any CUDA builtin functions. 用户不应定义与任何CUDA内置函数具有相同名称的函数。 We would suggest you to rename your atomicAdd function to one that is not the same as any CUDA builtin functions. 我们建议您将atomicAdd函数重命名为与任何CUDA内置函数不同的函数。

That flavor of atomicAdd is a new method introduced for compute capability 6.0. atomicAdd的这种风格是为计算能力6.0引入的一种新方法。 You may keep your previous implementation of other compute capabilities guarding it using macro definition 您可以保留以前使用宏定义保护其他计算功能的实现

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
<... place here your own pre-pascal atomicAdd definition ...>
#endif

This macro named architecture identification macro is documented here : 这个宏命名的体系结构标识宏在此处记录

5.7.4. 5.7.4。 Virtual Architecture Identification Macro 虚拟架构识别宏

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy. 在编译compute_xy的每个nvcc编译阶段1期间,为体系结构标识宏__CUDA_ARCH__分配一个三位数值字符串xy0(以字面0结尾)。

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. 该宏可用于GPU功能的实现,以确定当前正在编译的虚拟体系结构。 The host code (the non-GPU code) must not depend on it. 主机代码(非GPU代码)不得依赖它。

I assume NVIDIA did not place it for previous CC to avoid conflict for users defining it and not moving to Compute Capability >= 6.x. 我假设NVIDIA没有将它放在以前的CC中,以避免用户定义冲突而不转向Compute Capability> = 6.x. I would not consider it a BUG though, rather a release delivery practice. 我不认为它是一个BUG,而是一个发布交付实践。

EDIT : macro guard was incomplete (fixed) - here a complete example. 编辑 :宏观警卫不完整(固定) - 这里有一个完整的例子。

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
__device__ double atomicAdd(double* a, double b) { return b; }
#endif

__device__ double s_global ;
__global__ void kernel () { atomicAdd (&s_global, 1.0) ; }


int main (int argc, char* argv[])
{
        kernel<<<1,1>>> () ;
        return ::cudaDeviceSynchronize () ;
}

Compilation with: 编译:

$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

Command lines (both successful): 命令行(均成功):

$> nvcc main.cu -arch=sm_60
$> nvcc main.cu -arch=sm_35

You may find why it works with the include file: sm_60_atomic_functions.h , where the method is not declared if __CUDA_ARCH__ is lower than 600. 您可以找到为什么它与包含文件一起使用: sm_60_atomic_functions.h ,如果__CUDA_ARCH__低于600,则不声明该方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM