NVIDIA GPU 的 OpenCL 中浮点值的原子加法？

Question

The OpenCL 3.0 specification does not seem to have intrinsics/builtins for atomic addition to floating-point values, only for integral values (and that seems to have been the case in OpenCL 1.x and 2.x as well). OpenCL 3.0 规范似乎没有用于对浮点值进行原子加法的内在函数/内置函数，仅用于整数值（在 OpenCL 1.x 和 2.x 中似乎也是如此）。 CUDA, however, has offered floating-point atomics for a while now:然而，CUDA 提供浮点原子已有一段时间了：

float  atomicAdd(float*  address, float  val); // since Fermi
double atomicAdd(double* address, double val); // since Pascal
__half atomicAdd(__half *address, __half val); // ?

Naturally, any straightforward atomic operation can be simulated with compare-and-exchange, and this is available in OpenCL.自然地，任何简单的原子操作都可以用比较和交换来模拟，这在 OpenCL 中是可用的。 But my questions are:但我的问题是：

Does NVIDIA expose floating-point atomics in OpenCL somehow? NVIDIA 会以某种方式在 OpenCL 中公开浮点原子吗？ eg via a vendor extension?例如通过供应商扩展？ using pragmas?使用编译指示？ implicitly?隐含的？
Is there a more efficient mechanism than simulation with compare-exchange, which I could consider as a substitute for floating-point atomics?是否有比使用比较交换进行模拟更有效的机制，我可以考虑将其作为浮点原子的替代品？ For NVIDIA GPUs or generally?对于 NVIDIA GPU 还是一般？

Answer 1

Native foating-point atomics are a much desired extension for OpenCL 3.0.本机浮点原子是 OpenCL 3.0 非常需要的扩展。 As of right now, they are still not available.截至目前，它们仍然不可用。

The only possible way would be to use inline PTX .唯一可能的方法是使用内联 PTX 。
No. The implementation with atomic compare-exchange for FP32 and FP64 is currently state-of-the-art and there is no known better way.不。 FP32 和 FP64 的原子比较交换实现目前是最先进的，没有已知的更好方法。

UPDATE June 2022: Floating-point atomics are being added to the OpenCL 3.0 standard but adoption by hardware vendors might still take some time. 2022 年 6 月更新：浮点原子被添加到 OpenCL 3.0 标准中，但硬件供应商的采用可能仍需要一些时间。

Answer 2

As @ProjectPhysX implied in their answer, when you compile OpenCL with NVIDIA's driver, it accepts inline PTX assembly (which is of course not at all part of OpenCL nor a recognized vendor extension).正如@ProjectPhysX 在他们的回答中所暗示的那样，当您使用 NVIDIA 的驱动程序编译 OpenCL 时，它接受内联 PTX 程序集（这当然不是 OpenCL 的一部分，也不是公认的供应商扩展）。 This lets you basically do anything CUDA offers you - in OpenCL;这让你基本上可以做任何 CUDA 提供给你的事情——在 OpenCL 中； and that includes atomically adding to floating point values.这包括原子地添加到浮点值。

So, here are wrapper functions for atomically adding to single-precision (32-bit) floating point values in global and in local memory:因此，这里是用于在全局和本地内存中原子地添加单精度（32 位）浮点值的包装函数：

float atomic_add_float_global(__global float* p, float val)
{
    float prev;
    asm volatile(
        "atom.global.add.f32 %0, [%1], %2;" 
        : "=f"(prev) 
        : "l"(p) , "f"(val) 
        : "memory" 
    );
    return prev;
}

float atomic_add_float_local(__local float* p, float val)
{
    float prev;
    // Remember "local" in OpenCL means the same as "shared" in CUDA.
    asm volatile(
        "atom.shared.add.f32 %0, [%1], %2;"
        : "=f"(prev) 
        : "l"(p) , "f"(val) 
        : "memory" 
    );
    return prev;
}

One could also perhaps tweak it by checking whether the OpenCL driver is NVIDIA's, in which case the inline assembly is used, or non-NVIDIA, in which the atomic-compare-exchange implementation is used.也可以通过检查 OpenCL 驱动程序是否是 NVIDIA 的（在这种情况下使用内联程序集）或非 NVIDIA 的（使用 atomic-compare-exchange 实现）来调整它。

NVIDIA GPU 的 OpenCL 中浮点值的原子加法？

问题描述

2 个解决方案

解决方案1
3 2022-04-28 16:14:29

解决方案2
2 已采纳 2022-04-28 20:06:59

NVIDIA GPU 的 OpenCL 中浮点值的原子加法？

问题描述

2 个解决方案

解决方案1 3 2022-04-28 16:14:29

解决方案2 2 已采纳 2022-04-28 20:06:59

解决方案1
3 2022-04-28 16:14:29

解决方案2
2 已采纳 2022-04-28 20:06:59