简体   繁体   English

NVIDIA GPU 的 OpenCL 中浮点值的原子加法?

[英]Atomic addition to floating point values in OpenCL for NVIDIA GPUs?

The OpenCL 3.0 specification does not seem to have intrinsics/builtins for atomic addition to floating-point values, only for integral values (and that seems to have been the case in OpenCL 1.x and 2.x as well). OpenCL 3.0 规范似乎没有用于对浮点值进行原子加法的内在函数/内置函数,仅用于整数值(在 OpenCL 1.x 和 2.x 中似乎也是如此)。 CUDA, however, has offered floating-point atomics for a while now:然而,CUDA 提供浮点原子已有一段时间了:

float  atomicAdd(float*  address, float  val); // since Fermi
double atomicAdd(double* address, double val); // since Pascal
__half atomicAdd(__half *address, __half val); // ?

Naturally, any straightforward atomic operation can be simulated with compare-and-exchange, and this is available in OpenCL.自然地,任何简单的原子操作都可以用比较和交换来模拟,这在 OpenCL 中可用的。 But my questions are:但我的问题是:

  1. Does NVIDIA expose floating-point atomics in OpenCL somehow? NVIDIA 会以某种方式在 OpenCL 中公开浮点原子吗? eg via a vendor extension?例如通过供应商扩展? using pragmas?使用编译指示? implicitly?隐含的?
  2. Is there a more efficient mechanism than simulation with compare-exchange, which I could consider as a substitute for floating-point atomics?是否有比使用比较交换进行模拟更有效的机制,我可以考虑将其作为浮点原子的替代品? For NVIDIA GPUs or generally?对于 NVIDIA GPU 还是一般?

Native foating-point atomics are a much desired extension for OpenCL 3.0.本机浮点原子是 OpenCL 3.0 非常需要的扩展。 As of right now, they are still not available.截至目前,它们仍然不可用。

  1. The only possible way would be to use inline PTX .唯一可能的方法是使用内联 PTX
  2. No. The implementation with atomic compare-exchange for FP32 and FP64 is currently state-of-the-art and there is no known better way.不。 FP32 和 FP64 的原子比较交换实现目前是最先进的,没有已知的更好方法。

UPDATE June 2022: Floating-point atomics are being added to the OpenCL 3.0 standard but adoption by hardware vendors might still take some time. 2022 年 6 月更新: 浮点原子被添加到 OpenCL 3.0 标准中,但硬件供应商的采用可能仍需要一些时间。

As @ProjectPhysX implied in their answer, when you compile OpenCL with NVIDIA's driver, it accepts inline PTX assembly (which is of course not at all part of OpenCL nor a recognized vendor extension).正如@ProjectPhysX 在他们的回答中所暗示的那样,当您使用 NVIDIA 的驱动程序编译 OpenCL 时,它接受内联 PTX 程序集(这当然不是 OpenCL 的一部分,也不是公认的供应商扩展)。 This lets you basically do anything CUDA offers you - in OpenCL;这让你基本上可以做任何 CUDA 提供给你的事情——在 OpenCL 中; and that includes atomically adding to floating point values.这包括原子地添加到浮点值。

So, here are wrapper functions for atomically adding to single-precision (32-bit) floating point values in global and in local memory:因此,这里是用于在全局和本地内存中原子地添加单精度(32 位)浮点值的包装函数:

float atomic_add_float_global(__global float* p, float val)
{
    float prev;
    asm volatile(
        "atom.global.add.f32 %0, [%1], %2;" 
        : "=f"(prev) 
        : "l"(p) , "f"(val) 
        : "memory" 
    );
    return prev;
}

float atomic_add_float_local(__local float* p, float val)
{
    float prev;
    // Remember "local" in OpenCL means the same as "shared" in CUDA.
    asm volatile(
        "atom.shared.add.f32 %0, [%1], %2;"
        : "=f"(prev) 
        : "l"(p) , "f"(val) 
        : "memory" 
    );
    return prev;
}

One could also perhaps tweak it by checking whether the OpenCL driver is NVIDIA's, in which case the inline assembly is used, or non-NVIDIA, in which the atomic-compare-exchange implementation is used.也可以通过检查 OpenCL 驱动程序是否是 NVIDIA 的(在这种情况下使用内联程序集)或非 NVIDIA 的(使用 atomic-compare-exchange 实现)来调整它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM