应该是std :: atomic <int*> :: load正在进行比较和交换循环？

Question

Summary : I had expected that std::atomic<int*>::load with std::memory_order_relaxed would be close to the performance of just loading a pointer directly, at least when the loaded value rarely changes. 简介：我原本以为std::atomic<int*>::load with std::memory_order_relaxed将接近直接加载指针的性能，至少当加载的值很少改变时。 I saw far worse performance for the atomic load than a normal load on Visual Studio C++ 2012, so I decided to investigate. 我看到原子负载的性能远远低于Visual Studio C ++ 2012上的正常负载，因此我决定进行调查。 It turns out that the atomic load is implemented as a compare-and-swap loop, which I suspect is not the fastest possible implementation. 事实证明原子负载是作为比较和交换循环实现的，我怀疑它不是最快的实现。

Question : Is there some reason that std::atomic<int*>::load needs to do a compare-and-swap loop? 问题： std::atomic<int*>::load需要进行比较和交换循环？

Background : I believe that MSVC++ 2012 is doing a compare-and-swap loop on atomic load of a pointer based on this test program: 背景：我相信MSVC ++ 2012正在基于此测试程序对指针的原子加载进行比较和交换循环：

#include <atomic>
#include <iostream>

template<class T>
__declspec(noinline) T loadRelaxed(const std::atomic<T>& t) {
  return t.load(std::memory_order_relaxed);
}

int main() {
  int i = 42;
  char c = 42;
  std::atomic<int*> ptr(&i);
  std::atomic<int> integer;
  std::atomic<char> character;
  std::cout
    << *loadRelaxed(ptr) << ' '
    << loadRelaxed(integer) << ' '
    << loadRelaxed(character) << std::endl;
  return 0;
}

I'm using a __declspec(noinline) function in order to isolate the assembly instructions related to the atomic load. 我正在使用__declspec(noinline)函数来隔离与原子载荷相关的汇编指令。 I made a new MSVC++ 2012 project, added an x64 platform, selected the release configuration, ran the program in the debugger and looked at the disassembly. 我做了一个新的MSVC ++ 2012项目，添加了一个x64平台，选择了发布配置，在调试器中运行程序并查看了反汇编。 Turns out that both std::atomic<char> and std::atomic<int> parameters end up giving the same call to loadRelaxed<int> - this must be something the optimizer did. 事实证明， std::atomic<char>和std::atomic<int>参数最终都会对loadRelaxed<int>进行相同的调用 - 这必须是优化器所做的事情。 Here is the disassembly of the two loadRelaxed instantiations that get called: 这是被调用的两个loadRelaxed实例的反汇编：

loadRelaxed<int * __ptr64>

000000013F4B1790  prefetchw   [rcx]  
000000013F4B1793  mov         rax,qword ptr [rcx]  
000000013F4B1796  mov         rdx,rax  
000000013F4B1799  lock cmpxchg qword ptr [rcx],rdx  
000000013F4B179E  jne         loadRelaxed<int * __ptr64>+6h (013F4B1796h)

loadRelaxed<int>

000000013F3F1940  prefetchw   [rcx]  
000000013F3F1943  mov         eax,dword ptr [rcx]  
000000013F3F1945  mov         edx,eax  
000000013F3F1947  lock cmpxchg dword ptr [rcx],edx  
000000013F3F194B  jne         loadRelaxed<int>+5h (013F3F1945h)

The instruction lock cmpxchg is atomic compare-and-swap and we see here that the code for atomically loading a char , an int or an int* is a compare-and-swap loop. 指令lock cmpxchg是原子比较和交换，我们在这里看到原子加载char ， int或int*是比较和交换循环。 I also built this code for 32-bit x86 and that implementation is still based on lock cmpxchg . 我还为32位x86构建了这个代码，并且该实现仍然基于lock cmpxchg 。

Question : Is there some reason that std::atomic<int*>::load needs to do a compare-and-swap loop? 问题： std::atomic<int*>::load需要进行比较和交换循环？

Answer 1

I do not believe that relaxed atomic loads require compare-and-swap. 我不相信放松的原子载荷需要比较和交换。 In the end this std::atomic implementation was not usable for my purpose, but I still wanted to have the interface, so I made my own std::atomic using MSVC's barrier intrinsics. 最后这个std :: atomic实现不能用于我的目的，但我仍然想要接口，所以我使用MSVC的屏障内部函数创建了自己的std :: atomic。 This has better performance than the default std::atomic for my use case. 对于我的用例，这比默认的std::atomic具有更好的性能。 You can see the code here . 你可以在这里看到代码。 It's supposed to be implemented to the C++11 spec for all the orderings for load and store. 对于加载和存储的所有排序，它应该被实现为C ++ 11规范。 Btw GCC 4.6 is not better in this regard. Btw GCC 4.6在这方面并不是更好。 I don't know about GCC 4.7. 我不知道GCC 4.7。

应该是std :: atomic <int*> :: load正在进行比较和交换循环？

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-11-04 15:08:56

应该是std :: atomic <int*> :: load正在进行比较和交换循环？

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-11-04 15:08:56

解决方案1
1 已采纳 2012-11-04 15:08:56