64 位原子操作可以在 AMD 卡上的 openCL 中工作吗？

Question

The implementation of emulated atomics in openCL following the STREAM blog works nicely for atomic add in 32bit, on CPU as well as NVIDIA and AMD GPUs.在STREAM 博客之后的 openCL 中模拟原子的实现非常适用于 CPU 以及 NVIDIA 和 AMD GPU 上的 32 位原子添加。

The 64bit equivalent based on the cl_khr_int64_base_atomics extension seems to run properly on (pocl and intel) CPU as well as NVIDIA openCL drivers.基于cl_khr_int64_base_atomics扩展的 64 位等效项似乎可以在（pocl 和 intel）CPU 以及 NVIDIA openCL 驱动程序上正常运行。

I fail to make 64bit work on AMD GPU cards though -- both on amdgpu-pro and rocm (3.5.0) environments, running on a Radeon VII and a Radeon Instinct MI50, respectively.不过，我无法在 AMD GPU 卡上实现 64 位运行——在 amdgpu-pro 和 rocm (3.5.0) 环境中，分别在 Radeon VII 和 Radeon Instinct MI50 上运行。

The implementation goes as follows:实现如下：

inline void atomicAdd(volatile __global double *addr, double val)
{
    union {
        long u64;
        double f64;
    } next, expected, current;
    current.f64 = *addr;
    do {
        expected.f64 = current.f64;
        next.f64 = expected.f64 + val;
        current.u64 = atomic_cmpxchg(
            (volatile __global long *)addr,
            (long) expected.u64,
            (long) next.u64);
    } while( current.u64 != expected.u64 );
}

In absence of support for atomic operations for double types, the idea is to exploit casting to long as the values just need to be stored (no arithmetics needed).在不支持双精度类型的原子操作的情况下，想法是利用转换为只要值只需要存储（不需要算术）。 Then one should be able to use long atom_cmpxchg(__global long *p, long cmp, long val) as defined in the khronos manual for int64 base atomics .然后应该能够使用khronos 手册中为 int64 base atomics定义的long atom_cmpxchg(__global long *p, long cmp, long val) 。

The error I receive for both AMD environments points to falling back to 32bit versions, the compiler seems not to recognise the 64bit versions despite the #pragma :我收到的两种 AMD 环境的错误都指向回退到 32 位版本，尽管有#pragma ，编译器似乎无法识别 64 位版本：


/tmp/comgr-0bdbdc/input/CompileSource:21:17: error: call to 'atomic_cmpxchg' is ambiguous
                current.u64 = atomic_cmpxchg(
                              ^~~~~~~~~~~~~~
[...]/centos_pipeline_job_3.5/rocm-rel-3.5/rocm-3.5-30-20200528/7.5/out/centos-7/7/build/amd_comgr/<stdin>:13468:12: note: candidate function
int __ovld atomic_cmpxchg(volatile __global int *p, int cmp, int val);
           ^
[...]/centos_pipeline_job_3.5/rocm-rel-3.5/rocm-3.5-30-20200528/7.5/out/centos-7/7/build/amd_comgr/<stdin>:13469:21: note: candidate function
unsigned int __ovld atomic_cmpxchg(volatile __global unsigned int *p, unsigned int cmp, unsigned int val);
                    ^
1 error generated.
Error: Failed to compile opencl source (from CL or HIP source to LLVM IR).

I do find the support for cl_khr_int64_base_atomics in both environments on the clinfo extension list though.. Also cl_khr_int64_base is present in the opencl driver binary file.不过，我确实在clinfo扩展列表上的两种环境中都找到了对cl_khr_int64_base_atomics的支持。opencl 驱动程序二进制文件中也存在cl_khr_int64_base 。

Does anybody have an idea what might be going wrong here?有人知道这里可能出了什么问题吗？ Using the same implementation for 32bit (int and float instead of long and double) works flawlessly for me...对 32 位使用相同的实现（int 和 float 而不是 long 和 double）对我来说完美无缺......

Thanks for any hints.感谢您的任何提示。

Answer 1

For 64-bit, the function is called atom_cmpxchg and not atomic_cmpxchg .对于 64 位， function 称为atom_cmpxchg而不是atomic_cmpxchg 。

64 位原子操作可以在 AMD 卡上的 openCL 中工作吗？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-22 11:41:25

64 位原子操作可以在 AMD 卡上的 openCL 中工作吗？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-22 11:41:25

解决方案1
1 已采纳 2021-04-22 11:41:25