atomic_flag 是如何实现的？

Question

How is atomic_flag is implemented? atomic_flag是如何实现的？ It feels to me that on x86-64 it is equivalent to atomic_bool anyway, but it is just a guess.我觉得在 x86-64 上它无论如何都相当于atomic_bool ，但这只是一个猜测。 Might the x86-64 implementation be any different from arm or x86? x86-64 实现与 arm 或 x86 有什么不同吗？

Answer 1

Yeah, on normal CPUs where atomic<bool> and atomic<int> are also lock-free, it's pretty much like atomic<bool> , using the same instructions.是的，在atomic<bool>和atomic<int>也是无锁的普通 CPU 上，它非常像atomic<bool> ，使用相同的指令。 (x86 and x86-64 have the same set of atomic operations available.) （x86 和 x86-64 具有相同的可用原子操作集。）

You might think that it would always use x86 lock bts or lock btr to set / reset (clear) a single bit, but it can be more efficient to do other things (especially for a function that returns a bool instead of branching on it).您可能认为它总是使用 x86 lock bts或lock btr来设置/重置（清除）单个位，但做其他事情可能更有效（特别是对于返回 bool 而不是分支的函数） . The object is a whole byte so you can just store or exchange the whole byte.该对象是一个完整的字节，因此您可以存储或交换整个字节。 (And if the ABI guarantees that the value is always 0 or 1 , you don't have to booleanize it before returning the result as a bool ) （如果 ABI 保证该值始终为0或1 ，则在将结果作为bool返回之前不必对其进行bool ）

GCC and clang compile test_and_set to a byte exchange, and clear to a byte store of 0 . GCC 和 clang 将test_and_set编译为字节交换，并清除为0的字节存储。 We get (nearly) identical asm for atomic_flag test_and_set as f.exchange(true);我们为atomic_flag test_and_set得到（几乎）相同的 asm 作为f.exchange(true);

#include <atomic>

bool TAS(std::atomic_flag &f) {
    return f.test_and_set();
}

bool TAS_bool(std::atomic<bool> &f) {
    return f.exchange(true);
}


void clear(std::atomic_flag &f) {
    //f = 0; // deleted
    f.clear();
}

void clear_relaxed(std::atomic_flag &f) {
    f.clear(std::memory_order_relaxed);
}

void bool_clear(std::atomic<bool> &f) {
    f = false; // deleted
}

On Godbolt for x86-64 with gcc and clang, and for ARMv7 and AArch64. 在带有 gcc 和 clang 的 x86-64 以及 ARMv7 和 AArch64 的 Godbolt 上。

## GCC9.2 -O3 for x86-64
TAS(std::atomic_flag&):
        mov     eax, 1
        xchg    al, BYTE PTR [rdi]
        ret
TAS_bool(std::atomic<bool>&):
        mov     eax, 1
        xchg    al, BYTE PTR [rdi]
        test    al, al
        setne   al                      # missed optimization, doesn't need to booleanize to 0/1
        ret
clear(std::atomic_flag&):
        mov     BYTE PTR [rdi], 0
        mfence                          # memory fence to drain store buffer before future loads
        ret
clear_relaxed(std::atomic_flag&):
        mov     BYTE PTR [rdi], 0      # x86 stores are already mo_release, no barrier
        ret
bool_clear(std::atomic<bool>&):
        mov     BYTE PTR [rdi], 0
        mfence
        ret

Note that xchg is also an efficient way to do a seq_cst store on x86-64, usually more efficient than the mov + mfence that gcc uses.请注意， xchg也是在 x86-64 上执行seq_cst存储的有效方法，通常比 gcc 使用的mov + mfence更有效。 Clang uses xchg for all of these (except the relaxed store). Clang 将xchg用于所有这些（宽松商店除外）。

Amusingly, clang re-booleanizes to 0/1 after the xchg in atomic_flag.test_and_set() , but GCC instead does it after atomic<bool> .有趣的是，在atomic_flag.test_and_set()的 xchg 之后，clang 重新布尔值化为 0/1，但 GCC 改为在atomic<bool>之后执行它。 clang does a weird and al,1 in TAS_bool, which would treat values like 2 as false. clang 在 TAS_bool 中做了一个奇怪的and al,1 ，它将像2这样的值视为假。 It seems totally pointless;这似乎毫无意义； the ABI guarantees that a bool in memory is always stored as a 0 or 1 byte. ABI 保证内存中的bool始终存储为0或1字节。

For ARM, we have ldrexb / strexb exchange retry loops, or just strb + dmb ish for the pure store.对于 ARM，我们有ldrexb / strexb交换重试循环，或者只是strb + strb dmb ish用于纯存储。 Or AArch64 can use stlrb wzr, [x0] for clear or assign-false to do a sequential-release store (of the zero-register) without needing a barrier.或者 AArch64 可以使用stlrb wzr, [x0] for clear或 assign-false 来执行（零寄存器的）顺序释放存储，而无需屏障。

Answer 2

On most/sane architectures an interrupt can happen after or before a hardware instruction is executed.在大多数/健全的体系结构中，中断可能发生在硬件指令执行之后或之前。 Not "in between" it's execution.不是“介于两者之间”，而是执行。 So either the instruction "happens" (ie. with "side effects") or does not happen.因此，指令要么“发生”（即具有“副作用”），要么不发生。

For example a 16bit architecture most probably has hardware instructions to operate on 16bit variables with a single instruction.例如，一个 16 位架构很可能有硬件指令，可以用一条指令对 16 位变量进行操作。 So incrementing a 16bit variable will be a single instruction.因此，增加 16 位变量将是一条指令。 Storing a value in a 16bit variable will be a single instruction.将值存储在 16 位变量中将是一条指令。 Etc. Locking is not needed for 16bit variables, as the increment either happens or does not happen, atomically.等等。 16 位变量不需要锁定，因为增量要么发生要么不发生，原子性。 It's impossible on this architecture to observe the "mid execution" state of an increment of 16bit variable.在这种架构上不可能观察到 16 位变量增量的“执行中”状态。 It is a single instruction.这是一个单一的指令。 It can't be interrupted "in between" by any signal and interrupt.它不能被任何信号和中断“中间”中断。

A 16-bit architecture may lack instruction to increment a 64-bit variable in a single instruction. 16 位体系结构可能缺少在单个指令中递增 64 位变量的指令。 It may need many, many instructions to do operations on 64-bit variables.它可能需要很多很多指令来对 64 位变量进行操作。 So operations on std::atomic<uint64_t> need additional synchronization instruction inserted by the compiler to implement it's functionality, to implement synchronization with other std::atomic variables, etc.所以对std::atomic<uint64_t>需要编译器插入额外的同步指令来实现它的功能，实现与其他std::atomic变量的同步等。

But operations on 16bit variables on this architecture are single instructions, the compiler doesn't need to do anything with them, the side effects will always be visible everywhere after the instruction executes.但是在这种架构上对 16 位变量的操作是单条指令，编译器不需要对它们做任何事情，在指令执行后，副作用总是随处可见。

So atomic_flag is most probably just a variable that has the size of the word on a particular processor.所以atomic_flag很可能只是一个变量，它在特定处理器上具有单词的大小。 This is so that this processor can operate on this variable with single instructions.这是为了使该处理器可以使用单个指令对该变量进行操作。 In practice that is an int , but int is not guaranteed to correspond to the word size of the processor and accesses int handles are not guaranteed to be atomic.实际上它是一个int ，但不能保证int与处理器的字长相对应，并且不能保证访问int句柄是原子的。 I believe typically atomic_flag is the same as sig_atomic_t from posix ( posix docs ).我相信通常atomic_flag与来自 posix ( posix docs ) 的sig_atomic_t相同。 Additional atomic_flag constraints it's operations to bool -ish like only: clear, set and notify.额外的atomic_flag约束它是bool -ish 的操作，例如：清除、设置和通知。

atomic_flag 是如何实现的？

问题描述

2 个解决方案

解决方案1
7 已采纳 2020-01-05 16:08:56

解决方案2
0 2020-01-05 16:10:48

atomic_flag 是如何实现的？

问题描述

2 个解决方案

解决方案1 7 已采纳 2020-01-05 16:08:56

解决方案2 0 2020-01-05 16:10:48

解决方案1
7 已采纳 2020-01-05 16:08:56

解决方案2
0 2020-01-05 16:10:48