在 x86 上获取发布

Question

In the Intel Manual Vol.3 there is an example of loads reordering with early stores.在英特尔手册第 3 卷中，有一个使用早期商店重新排序负载的示例。

Initially x = y = 0最初x = y = 0

Core 1 :核心1 ：

mov [x], 1
mov r2, [y]

Core 2 :核心2 ：

mov [y], 1
mov r1, [x]

So r1 = r2 = 0 is possible.所以r1 = r2 = 0是可能的。 The question is if requiring acquire-release prohibits such scenario?问题是要求获取释放是否禁止这种情况？ On x86 store is a release store so I think no.在 x86 商店是发布商店，所以我认为没有。 Example:例子：

Core 1 :核心1 ：

release(mov [x], 1)
mov r2, [y]

Core 2 :核心2 ：

mov [y], 1
acquire(mov r1, [x])

In this case if acquire(mov r1, [x]) loads observe 0 then it's only possible to conclude that release(mov [x], 1) is not synchronized-with acquire(mov r1, [x]) in terms of the C11 Standard memory model specification standpoint, and it does not provide any guarantees which could prohibit reordering mov [y], 1 and acquire(mov r1, [x]) on the Core 2在这种情况下，如果acquire(mov r1, [x])加载观察 0，那么就只能得出结论release(mov [x], 1)与acquire(mov r1, [x])不同步C11 标准 memory model 规范立场，它不提供任何可以禁止在核心 2上重新排序mov [y], 1和acquire(mov r1, [x])的保证

Answer 1

Correct, acquire/release semantics cannot prevent StoreLoad reordering, ie taking a store followed by a load and interchanging their order.正确，获取/释放语义不能阻止 StoreLoad 重新排序，即先存储，然后加载并交换它们的顺序。 And such reordering is allowed for ordinary load and store instructions on x86.对于 x86 上的普通加载和存储指令，这种重新排序是允许的。

If you want to avoid such reordering in C11, you need to use memory_order_seq_cst on both the store and the load.如果要避免在 C11 中进行此类重新排序，则需要在存储和加载上都使用memory_order_seq_cst 。 In x86 assembly, you need a barrier in between the two instructions.在 x86 程序集中，您需要在两条指令之间设置屏障。 mfence serves this purpose, but so does any lock ed read-modify-write instruction, including xchg which does so even without the lock prefix. mfence用于此目的，但任何lock ed read-modify-write 指令也是如此，包括xchg即使没有lock前缀也会这样做。 So if you look at the generated assembly for memory_order_seq_cst operations, you'll see some such barrier in between.因此，如果您查看为memory_order_seq_cst操作生成的程序集，您会在两者之间看到一些这样的障碍。 (For certain reasons , something like lock add [rsp], 0 , or xchg between some register and memory whose contents are unimportant, can actually be more performant than mfence , so some compilers will do that even though it looks weird.) （由于某些原因，某些寄存器和 memory 之间的lock add [rsp], 0或xchg之类的内容并不重要，实际上可能比mfence性能更高，因此即使看起来很奇怪，一些编译器也会这样做。）

在 x86 上获取发布

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-09-01 14:45:03

在 x86 上获取发布

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-09-01 14:45:03

解决方案1
2 已采纳 2022-09-01 14:45:03