[英]Should std::atomic<int*>::load be doing a compare-and-swap loop?
Summary : I had expected that std::atomic<int*>::load
with std::memory_order_relaxed
would be close to the performance of just loading a pointer directly, at least when the loaded value rarely changes. 简介 :我原本以为
std::atomic<int*>::load
with std::memory_order_relaxed
将接近直接加载指针的性能,至少当加载的值很少改变时。 I saw far worse performance for the atomic load than a normal load on Visual Studio C++ 2012, so I decided to investigate. 我看到原子负载的性能远远低于Visual Studio C ++ 2012上的正常负载,因此我决定进行调查。 It turns out that the atomic load is implemented as a compare-and-swap loop, which I suspect is not the fastest possible implementation.
事实证明原子负载是作为比较和交换循环实现的,我怀疑它不是最快的实现。
Question : Is there some reason that std::atomic<int*>::load
needs to do a compare-and-swap loop? 问题 :
std::atomic<int*>::load
需要进行比较和交换循环?
Background : I believe that MSVC++ 2012 is doing a compare-and-swap loop on atomic load of a pointer based on this test program: 背景 :我相信MSVC ++ 2012正在基于此测试程序对指针的原子加载进行比较和交换循环:
#include <atomic>
#include <iostream>
template<class T>
__declspec(noinline) T loadRelaxed(const std::atomic<T>& t) {
return t.load(std::memory_order_relaxed);
}
int main() {
int i = 42;
char c = 42;
std::atomic<int*> ptr(&i);
std::atomic<int> integer;
std::atomic<char> character;
std::cout
<< *loadRelaxed(ptr) << ' '
<< loadRelaxed(integer) << ' '
<< loadRelaxed(character) << std::endl;
return 0;
}
I'm using a __declspec(noinline)
function in order to isolate the assembly instructions related to the atomic load. 我正在使用
__declspec(noinline)
函数来隔离与原子载荷相关的汇编指令。 I made a new MSVC++ 2012 project, added an x64 platform, selected the release configuration, ran the program in the debugger and looked at the disassembly. 我做了一个新的MSVC ++ 2012项目,添加了一个x64平台,选择了发布配置,在调试器中运行程序并查看了反汇编。 Turns out that both
std::atomic<char>
and std::atomic<int>
parameters end up giving the same call to loadRelaxed<int>
- this must be something the optimizer did. 事实证明,
std::atomic<char>
和std::atomic<int>
参数最终都会对loadRelaxed<int>
进行相同的调用 - 这必须是优化器所做的事情。 Here is the disassembly of the two loadRelaxed instantiations that get called: 这是被调用的两个loadRelaxed实例的反汇编:
loadRelaxed<int * __ptr64>
000000013F4B1790 prefetchw [rcx]
000000013F4B1793 mov rax,qword ptr [rcx]
000000013F4B1796 mov rdx,rax
000000013F4B1799 lock cmpxchg qword ptr [rcx],rdx
000000013F4B179E jne loadRelaxed<int * __ptr64>+6h (013F4B1796h)
loadRelaxed<int>
000000013F3F1940 prefetchw [rcx]
000000013F3F1943 mov eax,dword ptr [rcx]
000000013F3F1945 mov edx,eax
000000013F3F1947 lock cmpxchg dword ptr [rcx],edx
000000013F3F194B jne loadRelaxed<int>+5h (013F3F1945h)
The instruction lock cmpxchg
is atomic compare-and-swap and we see here that the code for atomically loading a char
, an int
or an int*
is a compare-and-swap loop. 指令
lock cmpxchg
是原子比较和交换 ,我们在这里看到原子加载char
, int
或int*
是比较和交换循环。 I also built this code for 32-bit x86 and that implementation is still based on lock cmpxchg
. 我还为32位x86构建了这个代码,并且该实现仍然基于
lock cmpxchg
。
Question : Is there some reason that std::atomic<int*>::load
needs to do a compare-and-swap loop? 问题 :
std::atomic<int*>::load
需要进行比较和交换循环?
I do not believe that relaxed atomic loads require compare-and-swap. 我不相信放松的原子载荷需要比较和交换。 In the end this std::atomic implementation was not usable for my purpose, but I still wanted to have the interface, so I made my own std::atomic using MSVC's barrier intrinsics.
最后这个std :: atomic实现不能用于我的目的,但我仍然想要接口,所以我使用MSVC的屏障内部函数创建了自己的std :: atomic。 This has better performance than the default
std::atomic
for my use case. 对于我的用例,这比默认的
std::atomic
具有更好的性能。 You can see the code here . 你可以在这里看到代码。 It's supposed to be implemented to the C++11 spec for all the orderings for load and store.
对于加载和存储的所有排序,它应该被实现为C ++ 11规范。 Btw GCC 4.6 is not better in this regard.
Btw GCC 4.6在这方面并不是更好。 I don't know about GCC 4.7.
我不知道GCC 4.7。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.