x86 上是否需要 std::memory_order_acquire 栅栏？

Question

Given x86 has a strong memory model, is std::memory_order_acquire fence ( not operation ) necessary?鉴于 x86 具有强大的内存模型，是否需要std::memory_order_acquire栅栏（不是操作）？

For example, if I have this code:例如，如果我有这个代码：

uint32_t read_shm(const uint64_t offset) {
   // m_head_memory_location is char* pointing to the beginning of a mmap-ed named shared memory segment
   // a different process on different core will write to it.
   return *(uint32_t*)(m_head_memory_location + offset);
}
....
int main() {
     uint32_t val = 0;
     while (0 != (val = shm.read(some location)));
     .... // use val
}

Do I really need std::atomic_thread_fence(std::memory_order_acquire) before the return statement?在 return 语句之前我真的需要std::atomic_thread_fence(std::memory_order_acquire)吗？

I feel it is not necessary because the goal of the code above is trying to read the first 4 bytes from m_head_memory_location + offset , and so any memory operations after fences being reordered doesn't affect the outcome.我觉得没有必要，因为上面代码的目标是尝试从m_head_memory_location + offset读取前 4 个字节，因此栅栏重新排序后的任何内存操作都不会影响结果。

Or there is some side effect making the acquire fence necessary?或者有一些副作用使获取栅栏变得必要？

Is there any case that an acquire fence (not operation) is necessary on x86?在 x86 上是否需要获取栅栏（不是操作）？

Any input is welcome.欢迎任何意见。

Answer 1

return *(uint32_t*)(m_head_memory_location + offset);

You cast to non- atomic non- volatile uint32_t* and dereference!!!您转换为非atomic非易失volatile uint32_t*并取消引用！！！

The compiler is allowed to assume that this uint32_t object isn't written by anything else (ie assume no data-race UB), so it can and will hoist the load out of the loop , effectively transforming it into something like if((val=load) == 0) infinite_loop();允许编译器假设这个uint32_t对象不是由其他任何东西写入的（即假设没有数据竞争 UB），因此它可以并将负载提升到循环之外，有效地将其转换为类似if((val=load) == 0) infinite_loop(); . .

https://electronics.stackexchange.com/questions/387181/mcu-programming-c-o2-optimization-breaks-while-loop/387478#387478 https://electronics.stackexchange.com/questions/387181/mcu-programming-c-o2-optimization-breaks-while-loop/387478#387478
Multithreading program stuck in optimized mode but runs normally in -O0 多线程程序卡在优化模式但在 -O0 下正常运行
When to use volatile with multi threading? 什么时候在多线程中使用 volatile？ (never, use atomic with mo_relaxed) （从不，在 mo_relaxed 中使用 atomic）

A GCC memory barrier will force a reload, but this is an implementation detail for std::atomic_thread_fence(std::memory_order_acquire) . GCC 内存屏障将强制重新加载，但这是std::atomic_thread_fence(std::memory_order_acquire)的实现细节。 For x86, that barrier only needs to block compile-time reordering, so a typical implementation for GCC might be asm("" ::: "memory") .对于 x86，该屏障只需要阻止编译时重新排序，因此 GCC 的典型实现可能是asm("" ::: "memory") 。

It's not the acquire ordering that's doing anything, it's the memory clobber that stops GCC from assuming another read will read the same thing.执行任何操作的不是获取排序，而是阻止 GCC 假设另一个读取将读取相同内容的内存破坏。 That's not something ISO C++ std::atomic_thread_fence(std::memory_order_acquire) implies for non-atomic variables.这不是 ISO C++ std::atomic_thread_fence(std::memory_order_acquire)对非原子变量暗示的东西。 (And it's always implied for atomic and volatile). （而且它总是暗示原子和易失性）。 So like I said, this would work in GCC but only as an implementation detail.所以就像我说的，这可以在 GCC 中工作，但只能作为一个实现细节。

It's also strict-aliasing UB if this memory is ever accessed with other types than this an char* , or if the underlying memory was declared as a char[] array.如果此内存曾被除此char*以外的其他类型访问，或者如果底层内存被声明为char[]数组，则它也是严格别名 UB。 If you got a char* from mmap or something then you're fine.如果你从mmap或其他东西得到一个char* ，那你就没事了。

It's also possible misalignment UB unless offset is known to be a multiple of 4. (Although unless GCC chooses to auto-vectorize , this won't bite you in practice on x86.)除非已知offset是 4 的倍数，否则 UB 也可能未对齐。（尽管除非 GCC 选择 auto-vectorize ，但这在 x86 上实际上不会咬你。）

You can solve these two for GNU C with typedef uint32_t unaligned_u32 __attribute((may_alias, aligned(1)));您可以使用typedef uint32_t unaligned_u32 __attribute((may_alias, aligned(1)));为 GNU C 解决这两个问题typedef uint32_t unaligned_u32 __attribute((may_alias, aligned(1))); but you still need volatile or atomic<T> for reading in a loop to work.但是您仍然需要volatile或atomic<T>才能在循环中读取才能工作。

In general一般来说

Use std::atomic_thread_fence(std::memory_order_acquire);使用std::atomic_thread_fence(std::memory_order_acquire); as required by the C++ memory model;根据 C++ 内存模型的要求； that's what governs reordering at compile time.这就是在编译时管理重新排序的原因。

When compiling for x86, it won't turn into any asm instructions;编译x86时，不会变成任何asm指令； in asm it's a no-op.在 asm 中，这是一个空操作。 But if you don't tell the compiler it can't reorder something, your code might break depending on compiler optimization level.但是如果你不告诉编译器它不能重新排序某些东西，你的代码可能会根据编译器优化级别而中断。

You might get lucky and have the compiler do a non-atomic load after an atomic mo_relaxed load, or it might do the non-atomic load earlier if you don't tell it not to.您可能很幸运，并让编译器在原子mo_relaxed加载之后执行非原子加载，或者如果您不告诉它不这样做，它可能会更早执行非原子加载。

x86 上是否需要 std::memory_order_acquire 栅栏？

问题描述

1 个解决方案

解决方案1
5 2019-12-18 15:59:35

In general一般来说

x86 上是否需要 std::memory_order_acquire 栅栏？

问题描述

1 个解决方案

解决方案1 5 2019-12-18 15:59:35

In general一般来说

解决方案1
5 2019-12-18 15:59:35