[英]Does standard C++11 guarantee that memory_order_seq_cst prevents StoreLoad reordering of non-atomic around an atomic?
Does standard C++11 guarantee that memory_order_seq_cst
prevents StoreLoad reordering around an atomic operation for non-atomic memory accesses? 标准C ++ 11是否保证
memory_order_seq_cst
阻止StoreLoad重新排序原子操作以进行非原子内存访问?
As known, there are 6 std::memory_order
s in C++11, and its specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation - Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf 众所周知,C ++ 11中有6个
std::memory_order
,它指定了如何围绕原子操作对常规非原子内存访问进行排序 - 工作草案,编程语言C ++标准2016-07-12 : http : //www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
§ 29.3 Order and consistency
§29.3顺序和一致性
§ 29.3 / 1
§29.3/ 1
The enumeration memory_order specifies the detailed regular (non-atomic) memory synchronization order as defined in 1.10 and may provide for operation ordering.
枚举memory_order指定1.10中定义的详细常规(非原子)内存同步顺序,并且可以提供操作排序。 Its enumerated values and their meanings are as follows:
其枚举值及其含义如下:
Also known, that these 6 memory_orders prevent some of these reordering: 众所周知,这6个memory_orders会阻止其中一些重新排序:
But, does memory_order_seq_cst
prevent StoreLoad reordering around an atomic operation for regular, non-atomic memory accesses or only for other atomic with the same memory_order_seq_cst
? 但是,
memory_order_seq_cst
阻止StoreLoad围绕原子操作重新排序以进行常规的非原子内存访问,或仅针对具有相同memory_order_seq_cst
其他原子进行memory_order_seq_cst
?
Ie to prevent this StoreLoad-reordering should we use std::memory_order_seq_cst
for both STORE and LOAD, or only for one of it? 即,为了防止这个StoreLoad重新排序,我们应该将
std::memory_order_seq_cst
用于STORE和LOAD,还是仅用于其中一个?
std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // Sequential Consistency
a.load(std::memory_order_seq_cst); // Sequential Consistency
About Acquire-Release semantic is all clear, it specifies exactly non-atomic memory-access reordering across atomic operations: http://en.cppreference.com/w/cpp/atomic/memory_order 关于Acquire-Release语义是明确的,它完全指定了跨原子操作的非原子内存访问重新排序: http : //en.cppreference.com/w/cpp/atomic/memory_order
To prevent StoreLoad-reordering we should use std::memory_order_seq_cst
. 为了防止StoreLoad重新排序,我们应该使用
std::memory_order_seq_cst
。
Two examples: 两个例子:
std::memory_order_seq_cst
for both STORE and LOAD: there is MFENCE
std::memory_order_seq_cst
: 有MFENCE
StoreLoad can't be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/mVZJs0 StoreLoad无法重新排序 - GCC 6.1.0 x86_64: https ://godbolt.org/g/mVZJs0
std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // can't be executed after LOAD
a.load(std::memory_order_seq_cst); // can't be executed before STORE
std::memory_order_seq_cst
for LOAD only: there isn't MFENCE
std::memory_order_seq_cst
仅std::memory_order_seq_cst
于LOAD: 没有MFENCE
StoreLoad can be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/2NLy12 StoreLoad可以重新排序 - GCC 6.1.0 x86_64: https ://godbolt.org/g/2NLy12
std::atomic<int> a, b;
b.store(1, std::memory_order_release); // can be executed after LOAD
a.load(std::memory_order_seq_cst); // can be executed before STORE
Also if C/C++-compiler used alternative mapping of C/C++11 to x86, which flushes the Store Buffer before the LOAD: MFENCE,MOV (from memory)
, so we must use std::memory_order_seq_cst
for LOAD too: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html As this example is discussed in another question as approach (3): Does it make any sense instruction LFENCE in processors x86/x86_64? 此外,如果C / C ++ - 编译器使用C / C ++ 11的替代映射到x86,它在LOAD之前刷新存储缓冲区:
MFENCE,MOV (from memory)
,所以我们也必须使用std::memory_order_seq_cst
进行LOAD: http ://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html这个例子在另一个问题中被讨论为方法(3): 在处理器x86 / x86_64中它是否有意义指令LFENCE?
Ie we should use std::memory_order_seq_cst
for both STORE and LOAD to generate MFENCE
guaranteed, that prevents StoreLoad reordering. 即我们应该使用
std::memory_order_seq_cst
进行STORE和LOAD以保证生成MFENCE
,这可以防止StoreLoad重新排序。
Is it true, that memory_order_seq_cst
for atomic Load or Store: 是真的,原子加载或存储的
memory_order_seq_cst
:
specifi Acquire-Release semantic - prevent: LoadLoad, LoadStore, StoreStore reordering around an atomic operation for regular, non-atomic memory accesses, specifici Acquire-Release语义 - 阻止:LoadLoad,LoadStore,StoreStore重新排序原子操作以进行常规的非原子内存访问,
but prevent StoreLoad reordering around an atomic operation only for other atomic operations with the same memory_order_seq_cst
? 但是阻止StoreLoad 仅针对具有相同
memory_order_seq_cst
其他原子操作重新排序原子操作?
No, standard C++11 doesn't guarantee that memory_order_seq_cst
prevents StoreLoad reordering of non-atomic
around an atomic(seq_cst)
. 不,标准C ++ 11 不保证
memory_order_seq_cst
阻止StoreLoad重新排序non-atomic
周围的non-atomic
atomic(seq_cst)
。
Even standard C++11 doesn't guarantee that memory_order_seq_cst
prevents StoreLoad reordering of atomic(non-seq_cst)
around an atomic(seq_cst)
. 即使是标准的C ++ 11 也不能保证
memory_order_seq_cst
阻止StoreLoad重新排序atomic(non-seq_cst)
周围的atomic(seq_cst)
。
Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf 工作草案,编程语言标准C ++ 2016-07-12: http : //www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
memory_order_seq_cst
operations - C++11 Standard: memory_order_seq_cst
操作都应该有一个总订单S - C ++ 11 Standard: § 29.3
§29.3
3
3
There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations , such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values: ...
在所有memory_order_seq_cst操作上应该存在单个总订单S,与所有受影响位置的“发生之前”订单和修改订单一致 ,使得从原子对象M加载值的每个memory_order_seq_cst操作B遵守以下值之一:...
memory_order_seq_cst
hasn't sequential consistency and hasn't single total order, ie non- memory_order_seq_cst
operations can be reordered with memory_order_seq_cst
operations in allowed directions - C++11 Standard: memory_order_seq_cst
弱的原子操作都没有顺序一致性,并且没有单个总顺序,即非memory_order_seq_cst
操作可以在允许的方向上使用memory_order_seq_cst
操作重新排序 - C ++ 11标准: § 29.3
§29.3
8 [ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations.
8 [注意: memory_order_seq_cst仅针对没有数据争用且仅使用memory_order_seq_cst操作 的程序确保顺序一致性 。 Any use of weaker ordering will invalidate this guarantee unless extreme care is used.
除非使用极度谨慎,否则使用较弱的订购将使此保证无效 。 In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves.
特别是,memory_order_seq_cst围栏仅确保围栏本身的总订单。 Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications.
通常,Fences不能用于恢复具有较弱排序规范的原子操作的顺序一致性。 — end note ]
- 结束说明]
Also C++-compilers allows such reorderings: 此外,C ++ - 编译器允许这样的重新排序:
Usually - if in compilers seq_cst implemented as barrier after store, then: 通常 - 如果在编译器中seq_cst在存储后实现为屏障,则:
STORE-C(relaxed);
LOAD-B(seq_cst);
can be reordered to LOAD-B(seq_cst);
可以重新排序到
LOAD-B(seq_cst);
STORE-C(relaxed);
Screenshot of Asm generated by GCC 7.0 x86_64: https://godbolt.org/g/4yyeby 由GCC 7.0 x86_64生成的Asm的屏幕截图: https : //godbolt.org/g/4yyeby
Also, theoretically possible - if in compilers seq_cst implemented as barrier before load, then: 另外,理论上可行 - 如果在编译器中seq_cst在加载之前实现为屏障,那么:
STORE-A(seq_cst);
LOAD-C(acq_rel);
can be reordered to LOAD-C(acq_rel);
可以重新排序到
LOAD-C(acq_rel);
STORE-A(seq_cst);
STORE-A(seq_cst);
LOAD-C(relaxed);
can be reordered to LOAD-C(relaxed);
可以重新排序到
LOAD-C(relaxed);
STORE-A(seq_cst);
Also on PowerPC can be such reordering: 另外在PowerPC上可以进行这样的重新排序:
STORE-A(seq_cst);
STORE-C(relaxed);
can reordered to STORE-C(relaxed);
可以重新排序到
STORE-C(relaxed);
STORE-A(seq_cst);
If even atomic variables are allowed to be reordered across atomic(seq_cst), then non-atomic variables can also be reordered across atomic(seq_cst). 如果允许原子变量跨原子(seq_cst)重新排序,那么非原子变量也可以在原子(seq_cst)上重新排序。
Screenshot of Asm generated by GCC 4.8 PowerPC: https://godbolt.org/g/BTQBr8 由GCC 4.8 PowerPC生成的Asm的屏幕截图: https : //godbolt.org/g/BTQBr8
More details: 更多细节:
STORE-C(release);
LOAD-B(seq_cst);
can be reordered to LOAD-B(seq_cst);
可以重新排序到
LOAD-B(seq_cst);
STORE-C(release);
Intel® 64 and IA-32 Architectures 英特尔®64和IA-32架构
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
8.2.3.4载荷可以与较早的商店重新排序到不同的地点
Ie x86_64 code: 即x86_64代码:
STORE-A(seq_cst);
STORE-C(release);
LOAD-B(seq_cst);
Can be reordered to: 可以重新排序:
STORE-A(seq_cst);
LOAD-B(seq_cst);
STORE-C(release);
This can happen because between c.store
and b.load
isn't mfence
: 这可能发生,因为
c.store
和b.load
之间不是mfence
:
x86_64 - GCC 7.0 : https://godbolt.org/g/dRGTaO x86_64 - GCC 7.0 : https : //godbolt.org/g/dRGTaO
C++ & asm - code: C ++&asm - 代码:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
c.store(4, std::memory_order_release); // movl 4,[c];
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
}
It can be reordered to: 它可以重新排序为:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
c.store(4, std::memory_order_release); // movl 4,[c];
}
Also, Sequential Consistency in x86/x86_64 can be implemented in four ways: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html 此外,x86 / x86_64中的顺序一致性可以通过四种方式实现: http : //www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
LOAD
(without fence) andSTORE
+MFENCE
LOAD
(没有围栏)和STORE
+MFENCE
LOAD
(without fence) andLOCK XCHG
LOAD
(没有围栏)和LOCK XCHG
MFENCE
+LOAD
andSTORE
(without fence)MFENCE
+LOAD
和STORE
(没有栅栏)LOCK XADD
( 0 ) andSTORE
(without fence)LOCK XADD
(0)和STORE
(没有围栏)
LOAD
and ( STORE
+ MFENCE
)/( LOCK XCHG
) - we reviewed above LOAD
和( STORE
+ MFENCE
)/( LOCK XCHG
) - 我们在上面进行了评论 MFENCE
+ LOAD
)/ LOCK XADD
and STORE
- allow next reordering: MFENCE
+ LOAD
)/ LOCK XADD
和STORE
- 允许下一次重新排序: STORE-A(seq_cst);
LOAD-C(acq_rel);
can be reordered to LOAD-C(acq_rel);
可以重新排序到
LOAD-C(acq_rel);
STORE-A(seq_cst);
STORE-A(seq_cst);
LOAD-C(relaxed);
can be reordered to LOAD-C(relaxed);
可以重新排序到
LOAD-C(relaxed);
STORE-A(seq_cst);
Allows Store-Load reordering ( Table 5 - PowerPC ): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf 允许存储负载重新排序( 表5 - PowerPC ): http : //www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Loads
加载后重新排序的商店
Ie PowerPC code: 即PowerPC代码:
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-C(relaxed);
LOAD-B(seq_cst);
Can be reordered to: 可以重新排序:
LOAD-C(relaxed);
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-B(seq_cst);
PowerPC - GCC 4.8 : https://godbolt.org/g/xowFD3 PowerPC - GCC 4.8 : https : //godbolt.org/g/xowFD3
C++ & asm - code: C ++&asm - 代码:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
a.store(2, std::memory_order_seq_cst); // li r9<-2; sync; stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
c.load(std::memory_order_relaxed); // lwz r9<-[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
By dividing a.store
into two parts - it can be reordered to: 通过将
a.store
分成两部分 - 它可以重新排序为:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where load-from-memory lwz r9<-[c];
从内存加载
lwz r9<-[c];
executed earlier than store-to-memory stw r9->[a];
比存储到内存
stw r9->[a];
更早执行stw r9->[a];
. 。
Also on PowerPC can be such reordering: 另外在PowerPC上可以进行这样的重新排序:
STORE-A(seq_cst);
STORE-C(relaxed);
can reordered to STORE-C(relaxed);
可以重新排序到
STORE-C(relaxed);
STORE-A(seq_cst);
Because PowerPC has weak memory ordering model - allows Store-Store reordering ( Table 5 - PowerPC ): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf 因为PowerPC具有弱内存排序模型 - 允许Store-Store重新排序( 表5 - PowerPC ): http : //www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Stores
商店后重新订购的商店
Ie on PowerPC operations Store can be reordered with other Store, then previous example can be reordered such as: 即在PowerPC上运行Store可以与其他Store重新排序,然后可以重新排序前面的示例,例如:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where store-to-memory stw r9->[c];
store-to-memory
stw r9->[c];
executed earlier than store-to-memory stw r9->[a];
比存储到内存
stw r9->[a];
更早执行stw r9->[a];
. 。
The std::memory_order_seq_cst
guarantees there is no reordering by either compiler nor cpu. std::memory_order_seq_cst
保证编译器和cpu都没有重新排序。 In this case the same memory order as if only one instruction where executed at a time. 在这种情况下,相同的内存顺序就好像每次只执行一条指令一样。
But the compiler optimization confuses the issues, if you turn off -O3 then the fence is there . 但是,编译器优化混淆了问题,如果关闭-O3那么栅栏那里 。
The compiler can see that in your test program with -O3 that there are no consequence of the mfence as the program is too simple. 编译器可以在你的测试程序中看到-O3没有mfence的后果,因为程序太简单了。
If you ran it on an Arm on the other hand like this you can see the barriers dmb ish
. 如果你运行它在手臂上,另一方面像这样你可以看到障碍
dmb ish
。
So if your program is more complex you might see the mfence
in this part of the code but not if the compiler can analyse and reason that it is not needed. 因此,如果您的程序更复杂,您可能会在代码的这一部分看到
mfence
,但如果编译器可以分析并mfence
它不需要则不会。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.