简体   繁体   English

C++原子加载排序效率

[英]C++ atomic load ordering efficiency

I have a memory variable that is updated in thread A and read in other threads.我有一个在线程 A 中更新并在其他线程中读取的内存变量。 The reader only cares if the value is non-zero.读者只关心该值是否非零。 I am guaranteed that once the value is incremented, it never goes back to zero.我保证一旦值增加,它就永远不会回到零。 Does it make sense to optimize as below?优化如下有意义吗? In other words, on the reader side, I dont need "fence" once I got my condition satisfied.换句话说,在读者方面,一旦我的条件得到满足,我就不需要“围栏”。

std::atomic<int> counter;

writer:
increment()
{ 
    counter.store(counter+1, std:memory_order_release)
}

reader:
iszero()
{
    if (counter.load(std::memory_order_relaxed) > 0) return false;
    // memory fence only if condition not yet reached
    return (counter.load(std::memory_order_acquire) == 0);
}

First, if you've not actually tried using the default (sequentially consistent) atomics, measured the performance of your app, profiled it, and shown observed them causing a performance problem, I'd suggest turning back now.首先,如果您没有实际尝试使用默认(顺序一致)原子,测量您的应用程序的性能,分析它,并显示观察到它们导致性能问题,我建议现在返回。

However, if you really do need to start reasoning about relaxed atomics...然而,如果你真的需要开始推理松弛原子......


That is not guaranteed to do what you expect, although it will almost certainly work on x86.虽然它几乎肯定可以在 x86 上运行,但不能保证它会按照您的预期运行。

I'm guessing that you're using this to guard the publication of some other non-atomic data.我猜您正在使用它来保护其他一些非原子数据的发布。

In that case, you need the guarantee that if you read a non-zero value in the reader thread, then various other side-effects to non-atomic memory locations (ie initializing the data you're publishing) that you made in the writer thread prior to the store will be visible to the reader thread.在这种情况下,您需要保证如果您在读取器线程中读取非零值,那么您在写入器中创建的非原子内存位置(即初始化您发布的数据)的各种其他副作用存储之前的线程将对读取器线程可见。

Reading non-zero with std::memory_order_relaxed does not synchronize with the std::memory_order_release store, so your code above does not have this guarantee.使用std::memory_order_relaxed读取非零值不会std::memory_order_release存储同步,因此您上面的代码没有此保证。

To get the behaviour I've described, you need to use std::memory_order_acquire .要获得我所描述的行为,您需要使用std::memory_order_acquire If you're on x86, then acquire doesn't produce any memory fence instructions, so the only way it will differ in performance from memory_order_relaxed is via preventing some compiler optimizations.如果您使用的是 x86,那么获取不会产生任何内存栅栏指令,因此它与memory_order_relaxed性能不同的唯一方法是阻止一些编译器优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM