简体   繁体   English

在C ++中访问原子变量的速度有多快

[英]How fast is access to atomic variables in C++

My question is how fast is access to atomic variables in C++ by using the C++0x actomic<> class? 我的问题是使用C ++ 0x actomic <>类访问C ++中的原子变量的速度有多快? What goes down at the cache level. 什么在缓存级别下降。 Say if one thread is just reading it, would it need to go down to the RAM or it can just read from the cache of the core in which it is executing? 假设一个线程只是在读它,它是否需要转到RAM或它只能从它正在执行的核心的缓存中读取? Assume the architecture is x86. 假设架构是x86。

I am especially interested in knowing if a thread is just reading from it, while no other thread is writing at that time, would the penalty would be the same as for reading a normal variable. 我特别感兴趣的是知道一个线程是否只是从它读取,而当时没有其他线程正在写入,惩罚将与读取正常变量相同。 How atomic variables are accessed. 如何访问原子变量。 Does each read implicity involves a write as well, as in compare-and-swap? 每个读取隐含都涉及写入,比如在比较和交换中吗? Are atomic variables implemented by using compare-and-swap? 原子变量是使用比较和交换实现的吗?

The answer is not as simple as you perhaps expect. 答案并不像你想象的那么简单。 It depends on exact CPU model, and it depends on circumstances as well. 它取决于确切的CPU模型,也取决于具体情况。 The worst case is when you need to perform read-modify-write operation on a variable and there is a conflict (what exactly is a conflict is again CPU model dependent, but most often it is when another CPU is accessing the same cache line). 最糟糕的情况是,当您需要对变量执行读取 - 修改 - 写入操作并且存在冲突时(冲突究竟是什么与CPU模型相关,但最常见的是当另一个CPU访问同一缓存行时) 。

See also .NET or Windows Synchronization Primitives Performance Specifications 另请参见.NET或Windows同步基元性能规范

如果你想要原始数字,Anger Fog的优化手册中的数据列表应该是有用的,同样, intels手册还有一些部分详细说明了多核系统上内存读/写的延迟,其中应该包括由原子写入所需的总线锁定。

Atomics use special architecture support to get atomicity without forcing all reads/writes to go all the way to main memory. Atomics使用特殊的体系结构支持来获得原子性,而不必强制所有读/写一直到主内存。 Basically, each core is allowed to probe the caches of other cores, so they find out about the result of other thread's operations that way. 基本上,允许每个核心探测其他核心的缓存,因此他们可以通过这种方式找到其他线程操作的结果。

The exact performance depends on the architecture. 确切的性能取决于架构。 On x86, MANY operations were already atomic to start with, so they are free. 在x86上,许多操作已经开始是原子的,因此它们是免费的。 I've seen numbers from anywhere to 10 to 100 cycles, depending on the architecture and operation. 我已经看到了从10到100个周期的数字,具体取决于架构和操作。 For perspective, any read from main memory is 3000-4000 cycles, so the atomics are all MUCH faster than going straight to memory on nearly all platforms. 从透视角度来看,从主存储器中读取的任何内容都是3000-4000个周期,因此原子数比几乎所有平台上的内存都要快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM