[英]How is atomic_flag implemented?
How is atomic_flag
is implemented? atomic_flag
是如何实现的? It feels to me that on x86-64 it is equivalent to atomic_bool
anyway, but it is just a guess.我觉得在 x86-64 上它无论如何都相当于atomic_bool
,但这只是一个猜测。 Might the x86-64 implementation be any different from arm or x86? x86-64 实现与 arm 或 x86 有什么不同吗?
Yeah, on normal CPUs where atomic<bool>
and atomic<int>
are also lock-free, it's pretty much like atomic<bool>
, using the same instructions.是的,在atomic<bool>
和atomic<int>
也是无锁的普通 CPU 上,它非常像atomic<bool>
,使用相同的指令。 (x86 and x86-64 have the same set of atomic operations available.) (x86 和 x86-64 具有相同的可用原子操作集。)
You might think that it would always use x86 lock bts
or lock btr
to set / reset (clear) a single bit, but it can be more efficient to do other things (especially for a function that returns a bool instead of branching on it).您可能认为它总是使用 x86 lock bts
或lock btr
来设置/重置(清除)单个位,但做其他事情可能更有效(特别是对于返回 bool 而不是分支的函数) . The object is a whole byte so you can just store or exchange the whole byte.该对象是一个完整的字节,因此您可以存储或交换整个字节。 (And if the ABI guarantees that the value is always 0
or 1
, you don't have to booleanize it before returning the result as a bool
) (如果 ABI 保证该值始终为0
或1
,则在将结果作为bool
返回之前不必对其进行bool
)
GCC and clang compile test_and_set
to a byte exchange, and clear to a byte store of 0
. GCC 和 clang 将test_and_set
编译为字节交换,并清除为0
的字节存储。 We get (nearly) identical asm for atomic_flag
test_and_set
as f.exchange(true);
我们为atomic_flag
test_and_set
得到(几乎)相同的 asm 作为f.exchange(true);
#include <atomic>
bool TAS(std::atomic_flag &f) {
return f.test_and_set();
}
bool TAS_bool(std::atomic<bool> &f) {
return f.exchange(true);
}
void clear(std::atomic_flag &f) {
//f = 0; // deleted
f.clear();
}
void clear_relaxed(std::atomic_flag &f) {
f.clear(std::memory_order_relaxed);
}
void bool_clear(std::atomic<bool> &f) {
f = false; // deleted
}
On Godbolt for x86-64 with gcc and clang, and for ARMv7 and AArch64. 在带有 gcc 和 clang 的 x86-64 以及 ARMv7 和 AArch64 的 Godbolt 上。
## GCC9.2 -O3 for x86-64
TAS(std::atomic_flag&):
mov eax, 1
xchg al, BYTE PTR [rdi]
ret
TAS_bool(std::atomic<bool>&):
mov eax, 1
xchg al, BYTE PTR [rdi]
test al, al
setne al # missed optimization, doesn't need to booleanize to 0/1
ret
clear(std::atomic_flag&):
mov BYTE PTR [rdi], 0
mfence # memory fence to drain store buffer before future loads
ret
clear_relaxed(std::atomic_flag&):
mov BYTE PTR [rdi], 0 # x86 stores are already mo_release, no barrier
ret
bool_clear(std::atomic<bool>&):
mov BYTE PTR [rdi], 0
mfence
ret
Note that xchg
is also an efficient way to do a seq_cst
store on x86-64, usually more efficient than the mov
+ mfence
that gcc uses.请注意, xchg
也是在 x86-64 上执行seq_cst
存储的有效方法,通常比 gcc 使用的mov
+ mfence
更有效。 Clang uses xchg
for all of these (except the relaxed store). Clang 将xchg
用于所有这些(宽松商店除外)。
Amusingly, clang re-booleanizes to 0/1 after the xchg in atomic_flag.test_and_set()
, but GCC instead does it after atomic<bool>
.有趣的是,在atomic_flag.test_and_set()
的 xchg 之后,clang 重新布尔值化为 0/1,但 GCC 改为在atomic<bool>
之后执行它。 clang does a weird and al,1
in TAS_bool, which would treat values like 2
as false. clang 在 TAS_bool 中做了一个奇怪的and al,1
,它将像2
这样的值视为假。 It seems totally pointless;这似乎毫无意义; the ABI guarantees that a bool
in memory is always stored as a 0
or 1
byte. ABI 保证内存中的bool
始终存储为0
或1
字节。
For ARM, we have ldrexb
/ strexb
exchange retry loops, or just strb
+ dmb ish
for the pure store.对于 ARM,我们有ldrexb
/ strexb
交换重试循环,或者只是strb
+ strb
dmb ish
用于纯存储。 Or AArch64 can use stlrb wzr, [x0]
for clear
or assign-false to do a sequential-release store (of the zero-register) without needing a barrier.或者 AArch64 可以使用stlrb wzr, [x0]
for clear
或 assign-false 来执行(零寄存器的)顺序释放存储,而无需屏障。
On most/sane architectures an interrupt can happen after or before a hardware instruction is executed.在大多数/健全的体系结构中,中断可能发生在硬件指令执行之后或之前。 Not "in between" it's execution.不是“介于两者之间”,而是执行。 So either the instruction "happens" (ie. with "side effects") or does not happen.因此,指令要么“发生”(即具有“副作用”),要么不发生。
For example a 16bit architecture most probably has hardware instructions to operate on 16bit variables with a single instruction.例如,一个 16 位架构很可能有硬件指令,可以用一条指令对 16 位变量进行操作。 So incrementing a 16bit variable will be a single instruction.因此,增加 16 位变量将是一条指令。 Storing a value in a 16bit variable will be a single instruction.将值存储在 16 位变量中将是一条指令。 Etc. Locking is not needed for 16bit variables, as the increment either happens or does not happen, atomically.等等。 16 位变量不需要锁定,因为增量要么发生要么不发生,原子性。 It's impossible on this architecture to observe the "mid execution" state of an increment of 16bit variable.在这种架构上不可能观察到 16 位变量增量的“执行中”状态。 It is a single instruction.这是一个单一的指令。 It can't be interrupted "in between" by any signal and interrupt.它不能被任何信号和中断“中间”中断。
A 16-bit architecture may lack instruction to increment a 64-bit variable in a single instruction. 16 位体系结构可能缺少在单个指令中递增 64 位变量的指令。 It may need many, many instructions to do operations on 64-bit variables.它可能需要很多很多指令来对 64 位变量进行操作。 So operations on std::atomic<uint64_t>
need additional synchronization instruction inserted by the compiler to implement it's functionality, to implement synchronization with other std::atomic
variables, etc.所以对std::atomic<uint64_t>
需要编译器插入额外的同步指令来实现它的功能,实现与其他std::atomic
变量的同步等。
But operations on 16bit variables on this architecture are single instructions, the compiler doesn't need to do anything with them, the side effects will always be visible everywhere after the instruction executes.但是在这种架构上对 16 位变量的操作是单条指令,编译器不需要对它们做任何事情,在指令执行后,副作用总是随处可见。
So atomic_flag
is most probably just a variable that has the size of the word on a particular processor.所以atomic_flag
很可能只是一个变量,它在特定处理器上具有单词的大小。 This is so that this processor can operate on this variable with single instructions.这是为了使该处理器可以使用单个指令对该变量进行操作。 In practice that is an int
, but int
is not guaranteed to correspond to the word size of the processor and accesses int
handles are not guaranteed to be atomic.实际上它是一个int
,但不能保证int
与处理器的字长相对应,并且不能保证访问int
句柄是原子的。 I believe typically atomic_flag
is the same as sig_atomic_t
from posix ( posix docs ).我相信通常atomic_flag
与来自 posix ( posix docs ) 的sig_atomic_t
相同。 Additional atomic_flag
constraints it's operations to bool
-ish like only: clear, set and notify.额外的atomic_flag
约束它是bool
-ish 的操作,例如:清除、设置和通知。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.