标准C ++ 11是否保证memory_order_seq_cst阻止StoreLoad在原子周围重新排序非原子？

Question

Does standard C++11 guarantee that memory_order_seq_cst prevents StoreLoad reordering around an atomic operation for non-atomic memory accesses? 标准C ++ 11是否保证memory_order_seq_cst阻止StoreLoad重新排序原子操作以进行非原子内存访问？

As known, there are 6 std::memory_order s in C++11, and its specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation - Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf 众所周知，C ++ 11中有6个std::memory_order ，它指定了如何围绕原子操作对常规非原子内存访问进行排序 - 工作草案，编程语言C ++标准2016-07-12 ： http ： //www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf

§ 29.3 Order and consistency §29.3顺序和一致性

§ 29.3 / 1 §29.3/ 1

The enumeration memory_order specifies the detailed regular (non-atomic) memory synchronization order as defined in 1.10 and may provide for operation ordering. 枚举memory_order指定1.10中定义的详细常规（非原子）内存同步顺序，并且可以提供操作排序。 Its enumerated values and their meanings are as follows: 其枚举值及其含义如下：

Also known, that these 6 memory_orders prevent some of these reordering: 众所周知，这6个memory_orders会阻止其中一些重新排序：

But, does memory_order_seq_cst prevent StoreLoad reordering around an atomic operation for regular, non-atomic memory accesses or only for other atomic with the same memory_order_seq_cst ? 但是， memory_order_seq_cst阻止StoreLoad围绕原子操作重新排序以进行常规的非原子内存访问，或仅针对具有相同memory_order_seq_cst其他原子进行memory_order_seq_cst ？

Ie to prevent this StoreLoad-reordering should we use std::memory_order_seq_cst for both STORE and LOAD, or only for one of it? 即，为了防止这个StoreLoad重新排序，我们应该将std::memory_order_seq_cst用于STORE和LOAD，还是仅用于其中一个？

std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // Sequential Consistency
a.load(std::memory_order_seq_cst); // Sequential Consistency

About Acquire-Release semantic is all clear, it specifies exactly non-atomic memory-access reordering across atomic operations: http://en.cppreference.com/w/cpp/atomic/memory_order 关于Acquire-Release语义是明确的，它完全指定了跨原子操作的非原子内存访问重新排序： http ： //en.cppreference.com/w/cpp/atomic/memory_order

To prevent StoreLoad-reordering we should use std::memory_order_seq_cst . 为了防止StoreLoad重新排序，我们应该使用std::memory_order_seq_cst 。

Two examples: 两个例子：

std::memory_order_seq_cst for both STORE and LOAD: there is MFENCE STORE和LOAD的std::memory_order_seq_cst ： 有MFENCE

StoreLoad can't be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/mVZJs0 StoreLoad无法重新排序 - GCC 6.1.0 x86_64： https ：//godbolt.org/g/mVZJs0

std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // can't be executed after LOAD
a.load(std::memory_order_seq_cst); // can't be executed before STORE

std::memory_order_seq_cst for LOAD only: there isn't MFENCE std::memory_order_seq_cst仅std::memory_order_seq_cst于LOAD： 没有MFENCE

StoreLoad can be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/2NLy12 StoreLoad可以重新排序 - GCC 6.1.0 x86_64： https ：//godbolt.org/g/2NLy12

std::atomic<int> a, b;
b.store(1, std::memory_order_release); // can be executed after LOAD
a.load(std::memory_order_seq_cst); // can be executed before STORE

Also if C/C++-compiler used alternative mapping of C/C++11 to x86, which flushes the Store Buffer before the LOAD: MFENCE,MOV (from memory) , so we must use std::memory_order_seq_cst for LOAD too: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html As this example is discussed in another question as approach (3): Does it make any sense instruction LFENCE in processors x86/x86_64? 此外，如果C / C ++ - 编译器使用C / C ++ 11的替代映射到x86，它在LOAD之前刷新存储缓冲区： MFENCE,MOV (from memory) ，所以我们也必须使用std::memory_order_seq_cst进行LOAD： http ：//www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html这个例子在另一个问题中被讨论为方法（3）：在处理器x86 / x86_64中它是否有意义指令LFENCE？

Ie we should use std::memory_order_seq_cst for both STORE and LOAD to generate MFENCE guaranteed, that prevents StoreLoad reordering. 即我们应该使用std::memory_order_seq_cst进行STORE和LOAD以保证生成MFENCE ，这可以防止StoreLoad重新排序。

Is it true, that memory_order_seq_cst for atomic Load or Store: 是真的，原子加载或存储的memory_order_seq_cst ：

specifi Acquire-Release semantic - prevent: LoadLoad, LoadStore, StoreStore reordering around an atomic operation for regular, non-atomic memory accesses, specifici Acquire-Release语义 - 阻止：LoadLoad，LoadStore，StoreStore重新排序原子操作以进行常规的非原子内存访问，
but prevent StoreLoad reordering around an atomic operation only for other atomic operations with the same memory_order_seq_cst ? 但是阻止StoreLoad 仅针对具有相同memory_order_seq_cst 其他原子操作重新排序原子操作？

Answer 1

No, standard C++11 doesn't guarantee that memory_order_seq_cst prevents StoreLoad reordering of non-atomic around an atomic(seq_cst) . 不，标准C ++ 11 不保证memory_order_seq_cst阻止StoreLoad重新排序non-atomic周围的non-atomic atomic(seq_cst) 。

Even standard C++11 doesn't guarantee that memory_order_seq_cst prevents StoreLoad reordering of atomic(non-seq_cst) around an atomic(seq_cst) . 即使是标准的C ++ 11 也不能保证memory_order_seq_cst阻止StoreLoad重新排序atomic(non-seq_cst)周围的atomic(seq_cst) 。

Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf 工作草案，编程语言标准C ++ 2016-07-12： http ： //www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf

There shall be a single total order S on all memory_order_seq_cst operations - C++11 Standard: 所有memory_order_seq_cst操作都应该有一个总订单S - C ++ 11 Standard：

§ 29.3 §29.3

3 3

There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations , such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values: ... 在所有memory_order_seq_cst操作上应该存在单个总订单S，与所有受影响位置的“发生之前”订单和修改订单一致 ，使得从原子对象M加载值的每个memory_order_seq_cst操作B遵守以下值之一：...

But, any atomic operations with ordering weaker than memory_order_seq_cst hasn't sequential consistency and hasn't single total order, ie non- memory_order_seq_cst operations can be reordered with memory_order_seq_cst operations in allowed directions - C++11 Standard: 但是，任何排序比memory_order_seq_cst弱的原子操作都没有顺序一致性，并且没有单个总顺序，即非memory_order_seq_cst操作可以在允许的方向上使用memory_order_seq_cst操作重新排序 - C ++ 11标准：

§ 29.3 §29.3

8 [ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. 8 [注意： memory_order_seq_cst仅针对没有数据争用且仅使用memory_order_seq_cst操作 的程序确保顺序一致性 。 Any use of weaker ordering will invalidate this guarantee unless extreme care is used. 除非使用极度谨慎，否则使用较弱的订购将使此保证无效 。 In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. 特别是，memory_order_seq_cst围栏仅确保围栏本身的总订单。 Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. 通常，Fences不能用于恢复具有较弱排序规范的原子操作的顺序一致性。 — end note ] - 结束说明]

Also C++-compilers allows such reorderings: 此外，C ++ - 编译器允许这样的重新排序：

On x86_64 在x86_64上

Usually - if in compilers seq_cst implemented as barrier after store, then: 通常 - 如果在编译器中seq_cst在存储后实现为屏障，则：

STORE-C(relaxed); LOAD-B(seq_cst); can be reordered to LOAD-B(seq_cst); 可以重新排序到LOAD-B(seq_cst); STORE-C(relaxed);

Screenshot of Asm generated by GCC 7.0 x86_64: https://godbolt.org/g/4yyeby 由GCC 7.0 x86_64生成的Asm的屏幕截图： https ： //godbolt.org/g/4yyeby

Also, theoretically possible - if in compilers seq_cst implemented as barrier before load, then: 另外，理论上可行 - 如果在编译器中seq_cst在加载之前实现为屏障，那么：

STORE-A(seq_cst); LOAD-C(acq_rel); can be reordered to LOAD-C(acq_rel); 可以重新排序到LOAD-C(acq_rel); STORE-A(seq_cst);

On PowerPC 在PowerPC上

STORE-A(seq_cst); LOAD-C(relaxed); can be reordered to LOAD-C(relaxed); 可以重新排序到LOAD-C(relaxed); STORE-A(seq_cst);

Also on PowerPC can be such reordering: 另外在PowerPC上可以进行这样的重新排序：

STORE-A(seq_cst); STORE-C(relaxed); can reordered to STORE-C(relaxed); 可以重新排序到STORE-C(relaxed); STORE-A(seq_cst);

If even atomic variables are allowed to be reordered across atomic(seq_cst), then non-atomic variables can also be reordered across atomic(seq_cst). 如果允许原子变量跨原子（seq_cst）重新排序，那么非原子变量也可以在原子（seq_cst）上重新排序。

Screenshot of Asm generated by GCC 4.8 PowerPC: https://godbolt.org/g/BTQBr8 由GCC 4.8 PowerPC生成的Asm的屏幕截图： https ： //godbolt.org/g/BTQBr8

More details: 更多细节：

On x86_64 在x86_64上

STORE-C(release); LOAD-B(seq_cst); can be reordered to LOAD-B(seq_cst); 可以重新排序到LOAD-B(seq_cst); STORE-C(release);

Intel® 64 and IA-32 Architectures 英特尔®64和IA-32架构

8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations 8.2.3.4载荷可以与较早的商店重新排序到不同的地点

Ie x86_64 code: 即x86_64代码：

STORE-A(seq_cst);
STORE-C(release); 
LOAD-B(seq_cst);

Can be reordered to: 可以重新排序：

STORE-A(seq_cst);
LOAD-B(seq_cst);
STORE-C(release);

This can happen because between c.store and b.load isn't mfence : 这可能发生，因为c.store和b.load之间不是mfence ：

x86_64 - GCC 7.0 : https://godbolt.org/g/dRGTaO x86_64 - GCC 7.0 ： https ： //godbolt.org/g/dRGTaO

C++ & asm - code: C ++＆asm - 代码：

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;
    a.store(2, std::memory_order_seq_cst);          // movl 2,[a]; mfence;
    c.store(4, std::memory_order_release);          // movl 4,[c];
    int tmp = b.load(std::memory_order_seq_cst);    // movl [b],[tmp];
}

It can be reordered to: 它可以重新排序为：

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;
    a.store(2, std::memory_order_seq_cst);          // movl 2,[a]; mfence;
    int tmp = b.load(std::memory_order_seq_cst);    // movl [b],[tmp];
    c.store(4, std::memory_order_release);          // movl 4,[c];
}

Also, Sequential Consistency in x86/x86_64 can be implemented in four ways: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html 此外，x86 / x86_64中的顺序一致性可以通过四种方式实现： http ： //www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

LOAD (without fence) and STORE + MFENCE LOAD （没有围栏）和STORE + MFENCE

LOAD (without fence) and LOCK XCHG LOAD （没有围栏）和LOCK XCHG

MFENCE + LOAD and STORE (without fence) MFENCE + LOAD和STORE （没有栅栏）

LOCK XADD ( 0 ) and STORE (without fence) LOCK XADD （0）和STORE （没有围栏）

1 and 2 ways: LOAD and ( STORE + MFENCE )/( LOCK XCHG ) - we reviewed above 1和2种方式： LOAD和（ STORE + MFENCE ）/（ LOCK XCHG ） - 我们在上面进行了评论
3 and 4 ways: ( MFENCE + LOAD )/ LOCK XADD and STORE - allow next reordering: 3种和4种方式：（ MFENCE + LOAD ）/ LOCK XADD和STORE - 允许下一次重新排序：

STORE-A(seq_cst); LOAD-C(acq_rel); can be reordered to LOAD-C(acq_rel); 可以重新排序到LOAD-C(acq_rel); STORE-A(seq_cst);

On PowerPC 在PowerPC上

STORE-A(seq_cst); LOAD-C(relaxed); can be reordered to LOAD-C(relaxed); 可以重新排序到LOAD-C(relaxed); STORE-A(seq_cst);

Allows Store-Load reordering ( Table 5 - PowerPC ): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf 允许存储负载重新排序（ 表5 - PowerPC ）： http ： //www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf

Stores Reordered After Loads 加载后重新排序的商店

Ie PowerPC code: 即PowerPC代码：

STORE-A(seq_cst);
STORE-C(relaxed); 
LOAD-C(relaxed); 
LOAD-B(seq_cst);

Can be reordered to: 可以重新排序：

LOAD-C(relaxed);
STORE-A(seq_cst);
STORE-C(relaxed); 
LOAD-B(seq_cst);

PowerPC - GCC 4.8 : https://godbolt.org/g/xowFD3 PowerPC - GCC 4.8 ： https ： //godbolt.org/g/xowFD3

C++ & asm - code: C ++＆asm - 代码：

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;       // addr: 20, 24, 28
    a.store(2, std::memory_order_seq_cst);          // li r9<-2; sync; stw r9->[a];
    c.store(4, std::memory_order_relaxed);          // li r9<-4; stw r9->[c];
    c.load(std::memory_order_relaxed);              // lwz r9<-[c];
    int tmp = b.load(std::memory_order_seq_cst);    // sync; lwz r9<-[b]; ... isync;
}

By dividing a.store into two parts - it can be reordered to: 通过将a.store分成两部分 - 它可以重新排序为：

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;       // addr: 20, 24, 28
    //a.store(2, std::memory_order_seq_cst);            // part-1: li r9<-2; sync;
    c.load(std::memory_order_relaxed);              // lwz r9<-[c];
    a.store(2, std::memory_order_seq_cst);          // part-2: stw r9->[a];
    c.store(4, std::memory_order_relaxed);          // li r9<-4; stw r9->[c];
    int tmp = b.load(std::memory_order_seq_cst);    // sync; lwz r9<-[b]; ... isync;
}

Where load-from-memory lwz r9<-[c]; 从内存加载lwz r9<-[c]; executed earlier than store-to-memory stw r9->[a]; 比存储到内存stw r9->[a];更早执行stw r9->[a]; . 。

Also on PowerPC can be such reordering: 另外在PowerPC上可以进行这样的重新排序：

STORE-A(seq_cst); STORE-C(relaxed); can reordered to STORE-C(relaxed); 可以重新排序到STORE-C(relaxed); STORE-A(seq_cst);

Because PowerPC has weak memory ordering model - allows Store-Store reordering ( Table 5 - PowerPC ): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf 因为PowerPC具有弱内存排序模型 - 允许Store-Store重新排序（ 表5 - PowerPC ）： http ： //www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf

Stores Reordered After Stores 商店后重新订购的商店

Ie on PowerPC operations Store can be reordered with other Store, then previous example can be reordered such as: 即在PowerPC上运行Store可以与其他Store重新排序，然后可以重新排序前面的示例，例如：

#include <atomic>

// Atomic load-store
void test() {
    std::atomic<int> a, b, c;       // addr: 20, 24, 28
    //a.store(2, std::memory_order_seq_cst);            // part-1: li r9<-2; sync;
    c.load(std::memory_order_relaxed);              // lwz r9<-[c];
    c.store(4, std::memory_order_relaxed);          // li r9<-4; stw r9->[c];
    a.store(2, std::memory_order_seq_cst);          // part-2: stw r9->[a];
    int tmp = b.load(std::memory_order_seq_cst);    // sync; lwz r9<-[b]; ... isync;
}

Where store-to-memory stw r9->[c]; store-to-memory stw r9->[c]; executed earlier than store-to-memory stw r9->[a]; 比存储到内存stw r9->[a];更早执行stw r9->[a]; . 。

Answer 2

The std::memory_order_seq_cst guarantees there is no reordering by either compiler nor cpu. std::memory_order_seq_cst保证编译器和cpu都没有重新排序。 In this case the same memory order as if only one instruction where executed at a time. 在这种情况下，相同的内存顺序就好像每次只执行一条指令一样。

But the compiler optimization confuses the issues, if you turn off -O3 then the fence is there . 但是，编译器优化混淆了问题，如果关闭-O3那么栅栏那里。

The compiler can see that in your test program with -O3 that there are no consequence of the mfence as the program is too simple. 编译器可以在你的测试程序中看到-O3没有mfence的后果，因为程序太简单了。

If you ran it on an Arm on the other hand like this you can see the barriers dmb ish . 如果你运行它在手臂上，另一方面像这样你可以看到障碍dmb ish 。

So if your program is more complex you might see the mfence in this part of the code but not if the compiler can analyse and reason that it is not needed. 因此，如果您的程序更复杂，您可能会在代码的这一部分看到mfence ，但如果编译器可以分析并mfence它不需要则不会。

标准C ++ 11是否保证memory_order_seq_cst阻止StoreLoad在原子周围重新排序非原子？

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-03-17 12:00:00

解决方案2
0 2016-08-20 12:00:16

标准C ++ 11是否保证memory_order_seq_cst阻止StoreLoad在原子周围重新排序非原子？

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-03-17 12:00:00

解决方案2 0 2016-08-20 12:00:16

解决方案1
4 已采纳 2017-03-17 12:00:00

解决方案2
0 2016-08-20 12:00:16