如何在 C++ 中原子地添加和获取 128 位数字？

Question

I use Linux x86_64 and clang 3.3.我使用 Linux x86_64 和 clang 3.3。

Is this even possible in theory?这在理论上是可能的吗？

std::atomic<__int128_t> doesn't work (undefined references to some functions). std::atomic<__int128_t>不起作用（对某些函数的未定义引用）。

__atomic_add_fetch also doesn't work ('error: cannot compile this atomic library call yet'). __atomic_add_fetch也不起作用（“错误：还不能编译这个原子库调用”）。

Both std::atomic and __atomic_add_fetch work with 64-bit numbers. std::atomic和__atomic_add_fetch适用于 64 位数字。

Answer 1

It's not possible to do this with a single instruction, but you can emulate it and still be lock-free. 使用单个指令无法执行此操作，但您可以模拟它并仍然无锁。 Except for the very earliest AMD64 CPUs, x64 supports the CMPXCHG16B instruction. 除最早的AMD64 CPU外，x64支持CMPXCHG16B指令。 With a little multi-precision math, you can do this pretty easily. 通过一点点多精度数学运算，您可以非常轻松地完成这项工作。

I'm afraid I don't know the instrinsic for CMPXCHG16B in GCC, but hopefully you get the idea of having a spin loop of CMPXCHG16B . 我担心我不知道GCC中CMPXCHG16B的内在性，但希望你能想到有一个CMPXCHG16B的自旋循环。 Here's some untested code for VC++: 这是VC ++的一些未经测试的代码：

// atomically adds 128-bit src to dst, with src getting the old dst.
void fetch_add_128b(uint64_t *dst, uint64_t* src)
{
    uint64_t srclo, srchi, olddst[2], exchlo, exchhi;

    srchi = src[0];
    srclo = src[1];
    olddst[0] = dst[0];
    olddst[1] = dst[1];

    do
    {
        exchlo = srclo + olddst[1];
        exchhi = srchi + olddst[0] + (exchlo < srclo); // add and carry
    }
    while(!_InterlockedCompareExchange128((long long*)dst,
                                          exchhi, exchlo,
                                          (long long*)olddst));

    src[0] = olddst[0];
    src[1] = olddst[1];
}

Edit: here's some untested code going off of what I could find for the GCC intrinsics: 编辑：这里有一些未经测试的代码与我可以找到的GCC内在函数有关：

// atomically adds 128-bit src to dst, returning the old dst.
__uint128_t fetch_add_128b(__uint128_t *dst, __uint128_t src)
{
    __uint128_t dstval, olddst;

    dstval = *dst;

    do
    {
        olddst = dstval;
        dstval = __sync_val_compare_and_swap(dst, dstval, dstval + src);
    }
    while(dstval != olddst);

    return dstval;
}

Answer 2

That isn't possible. 那是不可能的。 There is no x86-64 instruction that does a 128-bit add in one instruction, and to do something atomically, a basic starting point is that it is a single instruction (there are some instructions which aren't atomic even then, but that's another matter). 没有x86-64指令在一条指令中进行128位加法，并且以原子方式执行某些操作，一个基本的起点是它是一条指令（有些指令即使在那时也不是原子的，但那是另一件事）。

You will need to use some other lock around the 128-bit number. 您需要在128位数字周围使用其他锁定。

Edit: It is possible that one could come up with something that uses something like this: 编辑：有可能有人会想出一些使用这样的东西：

 __volatile__ __asm__(
    "     mov            %0, %%rax\n"
    "     mov            %0+4, %%rdx\n"
    "     mov            %1,%%rbx\n"
    "     mov            %1+4,%%rcx\n"
    "1:\n
    "     add            %%rax, %%rbx\n"
    "     adc            %%rdx, %%rcx\n"
    "     lock;cmpxcchg16b %0\n"
    "     jnz            1b\n"
    : "=0"
    : "0"(&arg1), "1"(&arg2));

That's just something I just hacked up, and I haven't compiled it, never mind validated that it will work. 这只是我刚刚破解的东西，我没有编译它，更不用说它会有效。 But the principle is that it repeats until it compares equal. 但原则是它重复直到比较平等。

Edit2: Darn typing too slow, Cory Nelson just posted the same thing, but using intrisics. 编辑2：Darn打字太慢，Cory Nelson刚刚发布了相同的内容，但使用的是inisics。

Edit3: Update loop to not unnecessary read memory that doesn't need reading... CMPXCHG16B does that for us. Edit3：更新循环到不需要读取的不必要的读取内存... CMPXCHG16B为我们做了。

Answer 3

Yes;是的; you need to tell your compiler that you're on hardware that supports it.你需要告诉你的编译器你在支持它的硬件上。

This answer is going to assume you're on x86-64;这个答案假设您使用的是 x86-64； there's likely a similar spec for arm. arm 可能有类似的规格。

From the generic x86-64 microarchitecture levels , you'll want at least x86-64-v2 to let the compiler know that you have the cmpxchg16b instruction.从通用 x86-64 微体系结构级别，您至少需要x86-64-v2才能让编译器知道您拥有cmpxchg16b指令。

Here's a working godbolt, note the compiler flag -march=x86-64-v2 : https://godbolt.org/z/PvaojqGcx这是一个可用的 Godbolt，注意编译器标志-march=x86-64-v2 ： https ://godbolt.org/z/PvaojqGcx

For more reading on the x86-64-psABI, the spec is published here .有关 x86-64-psABI 的更多阅读，请在此处发布规范。

如何在 C++ 中原子地添加和获取 128 位数字？

问题描述

3 个解决方案

解决方案1
8 已采纳 2013-08-11 23:31:07

解决方案2
2 2013-08-11 23:16:01

解决方案3
0 2021-11-19 18:40:14

如何在 C++ 中原子地添加和获取 128 位数字？

问题描述

3 个解决方案

解决方案1 8 已采纳 2013-08-11 23:31:07

解决方案2 2 2013-08-11 23:16:01

解决方案3 0 2021-11-19 18:40:14

解决方案1
8 已采纳 2013-08-11 23:31:07

解决方案2
2 2013-08-11 23:16:01

解决方案3
0 2021-11-19 18:40:14