如何在缺少stdatomic.h的机器上使用原子整数？

Question

I have developed a multithreaded program that depends on the availability of atomic_int, atomic_store and atomic_load from stdatomic.h. 我开发了一个多线程程序，它取决于stdatomic.h中atomic_int，atomic_store和atomic_load的可用性。 The program is compiled with GCC. 该计划由GCC编制。

Now, I tried to unsuccessfully compile the program on several old operating system versions that lack stdatomic.h. 现在，我试图在几个缺少stdatomic.h的旧操作系统版本上编译程序失败。 Unfortunately, it is a requirement that I am able to compile the program on old machines as well. 不幸的是，我需要能够在旧机器上编译程序。 So it is not enough that I compile the program on a new operating system version and run the binary on an old version. 因此，我在新的操作系统版本上编译程序并在旧版本上运行二进制文件是不够的。

Is there a way to emulate stdatomic.h on older machines, perhaps with some GCC-specific built-in function? 有没有办法在旧机器上模拟stdatomic.h，也许有一些特定于GCC的内置函数？

While installing a newer version of GCC on an old operating system might be the solution, the current build system has calls hardcoded to "gcc" all over it, and also the new GCC would have to be compiled from source as old operating systems don't have it in the package management system. 虽然在旧的操作系统上安装较新版本的GCC可能是解决方案，但是当前的构建系统已经硬编码到其上的“gcc”，并且新的GCC必须从源代码编译，因为旧操作系统不应该在包管理系统中有它。 So, ideally an answer would be something that works on old GCC versions. 因此，理想情况下，答案将适用于旧的GCC版本。

Answer 1

While this is not a completely drop-in solution for all applications, I found a way that supports the required basic functionality and passes at least some rudimentary multi-threading tests: 虽然这不是一个完全适用于所有应用程序的解决方案，但我发现了一种支持所需基本功能的方法，并至少通过了一些基本的多线程测试：

#define _Atomic(T) struct { volatile __typeof__(T) __val; }

typedef _Atomic(int) atomic_int;

#define atomic_load(object) \
    __sync_fetch_and_add(&(object)->__val, 0)

#define atomic_store(object, desired) do { \
    __sync_synchronize(); \
   (object)->__val = (desired); \
    __sync_synchronize(); \
} while (0)

The __sync_synchronize and __sync_fetch_and_add calls are necessary, or else communication between threads fails (I didn't test removing only one of them, I just tested removing both). __sync_synchronize和__sync_fetch_and_add调用是必要的，否则线程之间的通信失败（我没有测试只删除其中一个，我刚测试删除它们）。

I'm not very confident, however, that this solution works in all cases. 但是，我并不十分确信这种解决方案适用于所有情况。 I found it from https://gist.github.com/nhatminhle/5181506 where the author doesn't recommend it for old GCC versions. 我是从https://gist.github.com/nhatminhle/5181506找到的，作者不建议将它用于旧的GCC版本。

In theory, you could also use a mutex. 从理论上讲，您也可以使用互斥锁。 However, mutexes have poorer performance than atomics. 但是，互斥体的性能比原子性差。

Edit: 编辑：

It is also possible to implement atomic_store in the following way: 也可以通过以下方式实现atomic_store：

#define atomic_store(object, desired) do { \
    for (;;) \
    { \
        __typeof__((object)->__val) oldval = atomic_load(object); \
        if (__sync_bool_compare_and_swap(&(object)->__val, oldval, desired)) \
        { \
            break; \
        } \
    } \
} while (0)

However, that consistently reduced performance from 139280.5 ops/second (standard deviation 1799.6 ops/second) to 131805.6 ops/second (standard deviation 986.03 ops/second). 然而，这始终如一地将性能从139280.5 ops /秒（标准差1799.6 ops /秒）降至131805.6 ops /秒（标准差986.03 ops /秒）。 So, the reduced performance is statistically significant. 因此，降低的性能具有统计意义。

Edit 2: 编辑2：

The loop approach has the following assembly code: 循环方法具有以下汇编代码：

.globl signal_completion
        .type   signal_completion, @function
signal_completion:
.LFB18:
        leaq    4(%rdi), %rcx
.L42:
        xorl    %eax, %eax
        lock
        xaddl   %eax, (%rcx)
        movl    $1, %edx
        movl    %eax, -4(%rsp)
        movl    -4(%rsp), %eax
        lock
        cmpxchgl        %edx, (%rcx)
        jne     .L42
        rep ; ret
.LFE18:
        .size   signal_completion, .-signal_completion
        .p2align 4,,15

Whereas the __sync_synchronize approach has the following code: 而__sync_synchronize方法具有以下代码：

.globl signal_completion
        .type   signal_completion, @function
signal_completion:
.LFB18:
        movl    $1, 4(%rdi)
        ret
.LFE18:
        .size   signal_completion, .-signal_completion
        .p2align 4,,15

...and on a machine that has stdatomic.h it compiles to this: ...并且在具有stdatomic.h的机器上，它编译为：

        .globl  signal_completion
        .type   signal_completion, @function
signal_completion:
.LFB43:
        .cfi_startproc
        movl    $1, 4(%rdi)
        mfence
        ret
        .cfi_endproc
.LFE43:
        .size   signal_completion, .-signal_completion

So, the only thing I'm really lacking is mfence. 所以，我唯一真正缺乏的是mfence。 I guess it could be added using simple inline assembly, for example by this: 我猜它可以使用简单的内联汇编添加，例如：

asm volatile ("mfence" ::: "memory");

...placed after the second __sync_synchronize() in the atomic_store definition. ...放在atomic_store定义中的第二个__sync_synchronize（）之后。

Edit 3: 编辑3：

Apparently, the __sync_fetch_and_add is not optimized away, as a loop that polls a variable has this assembly output: 显然，__sync_fetch_and_add没有被优化掉，因为轮询变量的循环具有此程序集输出：

.L29:
        xorl    %eax, %eax
        lock
        xaddl   %eax, (%rdi)
        testl   %eax, %eax
        je      .L29

By having instead: 通过改为：

#define atomic_load(object) ((object)->__val)

You will get: 你会得到：

.L29:
        movl    (%rdi), %eax
        testl   %eax, %eax
        je      .L29

which is equivalent to the assembly on a stdatomic.h-supporting machine: 这相当于stdatomic.h支持机器上的程序集：

.L38:
        movl    (%rdi), %eax
        testl   %eax, %eax
        je      .L38

Strangely-enough, the __sync_fetch_and_add variant seems to run faster on my machine and on my benchmark even though it has more complex code. 奇怪的是，__sync_fetch_and_add变体似乎在我的机器和我的基准测试上运行得更快，即使它有更复杂的代码。 Strange world, isn't it? 奇怪的世界，不是吗？

Answer 2

The best thing is to roll out your own wrapper. 最好的事情是推出自己的包装。 Use stdatomic when available otherwise emulate the actions using mutexes or platform specific instructions. 在可用时使用stdatomic，否则使用互斥锁或平台特定指令模拟操作。

如何在缺少stdatomic.h的机器上使用原子整数？

问题描述

2 个解决方案

解决方案1
3 2017-03-05 12:51:11

解决方案2
1 2017-03-05 13:27:04

如何在缺少stdatomic.h的机器上使用原子整数？

问题描述

2 个解决方案

解决方案1 3 2017-03-05 12:51:11

解决方案2 1 2017-03-05 13:27:04

解决方案1
3 2017-03-05 12:51:11

解决方案2
1 2017-03-05 13:27:04