[英]Acquire-release on x86
In the Intel Manual Vol.3 there is an example of loads reordering with early stores.在英特尔手册第 3 卷中,有一个使用早期商店重新排序负载的示例。
Initially x = y = 0
最初
x = y = 0
Core 1 :核心1 :
mov [x], 1
mov r2, [y]
Core 2 :核心2 :
mov [y], 1
mov r1, [x]
So r1 = r2 = 0
is possible.所以
r1 = r2 = 0
是可能的。 The question is if requiring acquire-release prohibits such scenario?问题是要求获取释放是否禁止这种情况? On x86 store is a release store so I think no.
在 x86 商店是发布商店,所以我认为没有。 Example:
例子:
Core 1 :核心1 :
release(mov [x], 1)
mov r2, [y]
Core 2 :核心2 :
mov [y], 1
acquire(mov r1, [x])
In this case if acquire(mov r1, [x])
loads observe 0 then it's only possible to conclude that release(mov [x], 1)
is not synchronized-with acquire(mov r1, [x])
in terms of the C11 Standard memory model specification standpoint, and it does not provide any guarantees which could prohibit reordering mov [y], 1
and acquire(mov r1, [x])
on the Core 2在这种情况下,如果
acquire(mov r1, [x])
加载观察 0,那么就只能得出结论release(mov [x], 1)
与acquire(mov r1, [x])
不同步C11 标准 memory model 规范立场,它不提供任何可以禁止在核心 2上重新排序mov [y], 1
和acquire(mov r1, [x])
的保证
Correct, acquire/release semantics cannot prevent StoreLoad reordering, ie taking a store followed by a load and interchanging their order.正确,获取/释放语义不能阻止 StoreLoad 重新排序,即先存储,然后加载并交换它们的顺序。 And such reordering is allowed for ordinary load and store instructions on x86.
对于 x86 上的普通加载和存储指令,这种重新排序是允许的。
If you want to avoid such reordering in C11, you need to use memory_order_seq_cst
on both the store and the load.如果要避免在 C11 中进行此类重新排序,则需要在存储和加载上都使用
memory_order_seq_cst
。 In x86 assembly, you need a barrier in between the two instructions.在 x86 程序集中,您需要在两条指令之间设置屏障。
mfence
serves this purpose, but so does any lock
ed read-modify-write instruction, including xchg
which does so even without the lock
prefix. mfence
用于此目的,但任何lock
ed read-modify-write 指令也是如此,包括xchg
即使没有lock
前缀也会这样做。 So if you look at the generated assembly for memory_order_seq_cst
operations, you'll see some such barrier in between.因此,如果您查看为
memory_order_seq_cst
操作生成的程序集,您会在两者之间看到一些这样的障碍。 (For certain reasons , something like lock add [rsp], 0
, or xchg
between some register and memory whose contents are unimportant, can actually be more performant than mfence
, so some compilers will do that even though it looks weird.) (由于某些原因,某些寄存器和 memory 之间的
lock add [rsp], 0
或xchg
之类的内容并不重要,实际上可能比mfence
性能更高,因此即使看起来很奇怪,一些编译器也会这样做。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.