How is atomic_flag
is implemented? It feels to me that on x86-64 it is equivalent to atomic_bool
anyway, but it is just a guess. Might the x86-64 implementation be any different from arm or x86?
Yeah, on normal CPUs where atomic<bool>
and atomic<int>
are also lock-free, it's pretty much like atomic<bool>
, using the same instructions. (x86 and x86-64 have the same set of atomic operations available.)
You might think that it would always use x86 lock bts
or lock btr
to set / reset (clear) a single bit, but it can be more efficient to do other things (especially for a function that returns a bool instead of branching on it). The object is a whole byte so you can just store or exchange the whole byte. (And if the ABI guarantees that the value is always 0
or 1
, you don't have to booleanize it before returning the result as a bool
)
GCC and clang compile test_and_set
to a byte exchange, and clear to a byte store of 0
. We get (nearly) identical asm for atomic_flag
test_and_set
as f.exchange(true);
#include <atomic>
bool TAS(std::atomic_flag &f) {
return f.test_and_set();
}
bool TAS_bool(std::atomic<bool> &f) {
return f.exchange(true);
}
void clear(std::atomic_flag &f) {
//f = 0; // deleted
f.clear();
}
void clear_relaxed(std::atomic_flag &f) {
f.clear(std::memory_order_relaxed);
}
void bool_clear(std::atomic<bool> &f) {
f = false; // deleted
}
On Godbolt for x86-64 with gcc and clang, and for ARMv7 and AArch64.
## GCC9.2 -O3 for x86-64
TAS(std::atomic_flag&):
mov eax, 1
xchg al, BYTE PTR [rdi]
ret
TAS_bool(std::atomic<bool>&):
mov eax, 1
xchg al, BYTE PTR [rdi]
test al, al
setne al # missed optimization, doesn't need to booleanize to 0/1
ret
clear(std::atomic_flag&):
mov BYTE PTR [rdi], 0
mfence # memory fence to drain store buffer before future loads
ret
clear_relaxed(std::atomic_flag&):
mov BYTE PTR [rdi], 0 # x86 stores are already mo_release, no barrier
ret
bool_clear(std::atomic<bool>&):
mov BYTE PTR [rdi], 0
mfence
ret
Note that xchg
is also an efficient way to do a seq_cst
store on x86-64, usually more efficient than the mov
+ mfence
that gcc uses. Clang uses xchg
for all of these (except the relaxed store).
Amusingly, clang re-booleanizes to 0/1 after the xchg in atomic_flag.test_and_set()
, but GCC instead does it after atomic<bool>
. clang does a weird and al,1
in TAS_bool, which would treat values like 2
as false. It seems totally pointless; the ABI guarantees that a bool
in memory is always stored as a 0
or 1
byte.
For ARM, we have ldrexb
/ strexb
exchange retry loops, or just strb
+ dmb ish
for the pure store. Or AArch64 can use stlrb wzr, [x0]
for clear
or assign-false to do a sequential-release store (of the zero-register) without needing a barrier.
On most/sane architectures an interrupt can happen after or before a hardware instruction is executed. Not "in between" it's execution. So either the instruction "happens" (ie. with "side effects") or does not happen.
For example a 16bit architecture most probably has hardware instructions to operate on 16bit variables with a single instruction. So incrementing a 16bit variable will be a single instruction. Storing a value in a 16bit variable will be a single instruction. Etc. Locking is not needed for 16bit variables, as the increment either happens or does not happen, atomically. It's impossible on this architecture to observe the "mid execution" state of an increment of 16bit variable. It is a single instruction. It can't be interrupted "in between" by any signal and interrupt.
A 16-bit architecture may lack instruction to increment a 64-bit variable in a single instruction. It may need many, many instructions to do operations on 64-bit variables. So operations on std::atomic<uint64_t>
need additional synchronization instruction inserted by the compiler to implement it's functionality, to implement synchronization with other std::atomic
variables, etc.
But operations on 16bit variables on this architecture are single instructions, the compiler doesn't need to do anything with them, the side effects will always be visible everywhere after the instruction executes.
So atomic_flag
is most probably just a variable that has the size of the word on a particular processor. This is so that this processor can operate on this variable with single instructions. In practice that is an int
, but int
is not guaranteed to correspond to the word size of the processor and accesses int
handles are not guaranteed to be atomic. I believe typically atomic_flag
is the same as sig_atomic_t
from posix ( posix docs ). Additional atomic_flag
constraints it's operations to bool
-ish like only: clear, set and notify.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.