在C ++ 11中是否有任何编译器障碍等于asm（“”:::“memory”）？

Question

My test code is as below, and I found that only the memory_order_seq_cst forbade compiler's reorder. 我的测试代码如下，我发现只有memory_order_seq_cst禁止编译器重新排序。

#include <atomic>

using namespace std;

int A, B = 1;

void func(void) {
    A = B + 1;
    atomic_thread_fence(memory_order_seq_cst);
    B = 0;
}

And other choices such as memory_order_release , memory_order_acq_rel did not generate any compiler barrier at all. 而诸如memory_order_release ， memory_order_acq_rel等其他选择根本不会产生任何编译器障碍。

I think they must work with atomic variable just as below. 我认为他们必须使用原子变量，如下所示。

#include <atomic>

using namespace std;

atomic<int> A(0);
int B = 1;

void func(void) {
    A.store(B+1, memory_order_release);
    B = 0;
}

But I do not want to use atomic variable. 但我不想使用原子变量。 At the same time, I think the "asm("":::"memory")" is too low level. 与此同时，我认为“asm（”“:::”记忆“）”太低了。

Is there any better choice? 还有更好的选择吗？

Answer 1

re: your edit: re：你的编辑：

But I do not want to use atomic variable. 但我不想使用原子变量。

Why not? 为什么不？ If it's for performance reasons, use them with memory_order_relaxed and atomic_signal_fence(mo_whatever) to block compiler reordering without any runtime overhead other than the compiler barrier potentially blocking some compile-time optimizations, depending on the surrounding code. 如果出于性能原因，将它们与memory_order_relaxed和atomic_signal_fence(mo_whatever)来阻止编译器重新排序，除了编译器屏障可能阻止某些编译时优化之外没有任何运行时开销，具体取决于周围的代码。

If it's for some other reason, then maybe atomic_signal_fence will give you code that happens to work on your target platform. 如果出于其他原因，那么atomic_signal_fence可能会为您提供恰好在目标平台上运行的代码。 I suspect that it does order non- atomic<> loads and/or stores, so it might even help avoid data-race Undefined Behaviour in C++. 我怀疑它确实订购了非atomic<>加载和/或存储，因此它甚至可以帮助避免C ++中的数据争用未定义行为。

Sufficient for what? 什么足够？

Regardless of any barriers, if two threads run this function at the same time, your program has Undefined Behaviour because of concurrent access to non- atomic<> variables. 无论有任何障碍，如果两个线程同时运行此函数，则由于对非atomic<>变量的并发访问，您的程序具有未定义行为。 So the only way this code can be useful is if you're talking about synchronizing with a signal handler that runs in the same thread. 因此，如果您正在讨论与在同一线程中运行的信号处理程序同步，则此代码有用的唯一方法。

That would also be consistent with asking for a "compiler barrier", to only prevent reordering at compile time, because out-of-order execution and memory reordering always preserve the behaviour of a single thread. 这也与要求“编译器障碍”一致，只是为了防止在编译时重新排序，因为无序执行和内存重新排序总是保留单个线程的行为。 So you never need extra barrier instructions to make sure you see your own operations in program order, you just need to stop the compiler reordering stuff at compile time. 因此，您永远不需要额外的屏障指令来确保按程序顺序查看自己的操作，您只需要在编译时停止编译器重新排序。 See Jeff Preshing's post: Memory Ordering at Compile Time 请参阅Jeff Preshing的帖子：编译时的内存排序

This is what atomic_signal_fence is for . 这就是atomic_signal_fence的用途 。 You can use it with any std::memory_order , just like thread_fence, to get different strengths of barrier and only prevent the optimizations you need to prevent. 您可以将它与任何std::memory_order ，就像thread_fence一样，以获得不同的屏障优势，并且只能阻止您需要防止的优化。

... atomic_thread_fence(memory_order_acq_rel) did not generate any compiler barrier at all! ... atomic_thread_fence(memory_order_acq_rel)根本没有生成任何编译器障碍！

Totally wrong, in several ways. 在几个方面完全错了。

atomic_thread_fence is a compiler barrier plus whatever run-time barriers are necessary to restrict reordering in the order our loads/stores become visible to other threads. atomic_thread_fence 是一个编译器障碍，加上我们的加载/存储对其他线程可见的顺序限制重新排序所需的任何运行时障碍。

I'm guessing you mean it didn't emit any barrier instructions when you looked at the asm output for x86. 我猜你的意思是当你查看x86的asm输出时它没有发出任何障碍指令。 Instructions like x86's MFENCE are not "compiler barriers", they're run-time memory barriers and prevent even StoreLoad reordering at run-time. 像x86的MFENCE这样的指令不是“编译器障碍”，它们是运行时内存屏障，甚至阻止StoreLoad在运行时重新排序。 (That's the only reordering that x86 allows. SFENCE and LFENCE are only needed when using weakly-ordered (NT) stores, like MOVNTPS ( _mm_stream_ps ) .) （这是x86允许的唯一重新排序。只有在使用弱排序（NT）存储时才需要SFENCE和LFENCE，例如MOVNTPS （ _mm_stream_ps ）。）

On a weakly-ordered ISA like ARM, thread_fence(mo_acq_rel) isn't free, and compiles to an instruction. 在像ARM这样弱有序的ISA上，thread_fence（mo_acq_rel）不是免费的，而是编译成一条指令。 gcc5.4 uses dmb ish . gcc5.4使用dmb dmb ish 。 (See it on the Godbolt compiler explorer ). （在Godbolt编译器资源管理器中查看）。

A compiler barrier just prevents reordering at compile time, without necessarily preventing run-time reordering. 编译器屏障只是在编译时阻止重新排序，而不必阻止运行时重新排序。 So even on ARM, atomic_signal_fence(mo_seq_cst) compiles to no instructions. 因此，即使在ARM上， atomic_signal_fence(mo_seq_cst)编译为无指令。

A weak enough barrier allows the compiler to do the store to B ahead of the store to A if it wants, but gcc happens to decide to still do them in source order even with thread_fence(mo_acquire) (which shouldn't order stores with other stores). 弱足够的屏障允许编译器做商店B前面的商店到A如果它想，但GCC情况来决定还是做他们的源顺序甚至thread_fence（mo_acquire）（它不应该与其他的顺序记录商店）。

So this example doesn't really test whether something is a compiler barrier or not. 所以这个例子并没有真正测试某些东西是否是编译器障碍。

Strange compiler behaviour from gcc for an example that is different with a compiler barrier : 来自gcc的奇怪编译器行为与编译器障碍不同的示例 ：

See this source+asm on Godbolt . 在Godbolt上看到这个来源+ asm 。

#include <atomic>
using namespace std;
int A,B;

void foo() {
  A = 0;
  atomic_thread_fence(memory_order_release);
  B = 1;
  //asm volatile(""::: "memory");
  //atomic_signal_fence(memory_order_release);
  atomic_thread_fence(memory_order_release);
  A = 2;
}

This compiles with clang the way you'd expect: the thread_fence is a StoreStore barrier, so the A=0 has to happen before B=1, and can't be merged with the A=2. 这会以你期望的方式编译clang：thread_fence是StoreStore屏障，因此A = 0必须在B = 1之前发生，并且不能与A = 2合并。

    # clang3.9 -O3
    mov     dword ptr [rip + A], 0
    mov     dword ptr [rip + B], 1
    mov     dword ptr [rip + A], 2
    ret

But with gcc, the barrier has no effect, and only the final store to A is present in the asm output. 但是对于gcc，屏障没有效果，只有A的最终存储在asm输出中。

    # gcc6.2 -O3
    mov     DWORD PTR B[rip], 1
    mov     DWORD PTR A[rip], 2
    ret

But with atomic_signal_fence(memory_order_release) , gcc's output matches clang. 但是使用atomic_signal_fence(memory_order_release) ，gcc的输出匹配clang。 So atomic_signal_fence(mo_release) is having the barrier effect we expect, but atomic_thread_fence with anything weaker than seq_cst isn't acting as a compiler barrier at all. 因此atomic_signal_fence(mo_release)具有我们期望的屏障效果，但是任何弱于seq_cst的atomic_thread_fence都不会充当编译屏障。

One theory here is that gcc knows that it's officially Undefined Behaviour for multiple threads to write to non- atomic<> variables. 这里的一个理论是gcc知道它是多个线程写入非atomic<>变量的官方未定义行为。 This doesn't hold much water, because atomic_thread_fence should still work if used to synchronize with a signal handler, it's just stronger than necessary. 这并没有多少水，因为如果用于与信号处理程序同步， atomic_thread_fence仍然可以工作，它只是比必要的强。

BTW, with atomic_thread_fence(memory_order_seq_cst) , we get the expected BTW，使用atomic_thread_fence(memory_order_seq_cst) ，我们得到了预期的结果

    # gcc6.2 -O3, with a mo_seq_cst barrier
    mov     DWORD PTR A[rip], 0
    mov     DWORD PTR B[rip], 1
    mfence
    mov     DWORD PTR A[rip], 2
    ret

We get this even with only one barrier, which would still allow the A=0 and A=2 stores to happen one after the other, so the compiler is allowed to merge them across a barrier. 即使只有一个屏障，我们也可以得到这个，这仍然允许A = 0和A = 2存储一个接一个地发生，因此允许编译器跨屏障合并它们。 (Observers failing to see separate A=0 and A=2 values is a possible ordering, so the compiler can decide that's what always happens). （观察者未能看到单独的A = 0和A = 2值是可能的排序，因此编译器可以决定总是会发生什么）。 Current compilers don't usually do this kind of optimization, though. 但是，当前的编译器通常不会进行这种优化。 See discussion at the end of my answer on Can num++ be atomic for 'int num'? 请参阅我的答案结尾处的讨论Can num ++是'int num'的原子？ . 。

在C ++ 11中是否有任何编译器障碍等于asm（“”:::“memory”）？

问题描述

1 个解决方案

解决方案1
6 已采纳 2016-11-13 22:47:35

在C ++ 11中是否有任何编译器障碍等于asm（“”:::“memory”）？

问题描述

1 个解决方案

解决方案1 6 已采纳 2016-11-13 22:47:35

解决方案1
6 已采纳 2016-11-13 22:47:35