简体   繁体   English

即使没有锁,是否也会存在 memory 争用?

[英]Can there be memory contention even if there are no locks?

I am writing a multithreaded code where a bunch of std::async calls spawn a fixed number of threads for the duration of the entire program.我正在编写一个多线程代码,其中一堆std::async调用在整个程序期间产生固定数量的线程。 Each thread works off the same const BigData structure on a read-only basis.每个线程都以只读方式处理相同的const BigData结构。 There are frequent, random reads from const BigData but the threads are otherwise totally independent.const BigData有频繁的随机读取,但线程是完全独立的。 Can one reasonably expect to get perfect scaling or is there a slowdown to be expected from more memory accesses?是否可以合理地期望获得完美的扩展,或者是否可以预期更多的 memory 访问会放缓?

EDIT: After some profiling, this seems to be the culprit:编辑:经过一些分析,这似乎是罪魁祸首:

class Point {
  friend Point operator+(const Point& lhs, const Point& rhs) noexcept {
    return Point{lhs.x + rhs.x, lhs.y + rhs.y, lhs.z + rhs.z};
  };
  friend Point operator-(const Point& lhs, const Point& rhs) noexcept {
    return Point{lhs.x - rhs.x, lhs.y - rhs.y, lhs.z - rhs.z};
  };

public:
  Point() noexcept;
  Point(const Real& x, const Real& y, const Real& z) noexcept
    : x{x}, y{y}, z{z} {};

private:
  Real x{0};
  Real y{0};
  Real z{0};
};

After refactoring my code to avoid unnecessary calls to operator+ and operator- , I seem to get better scaling.在重构我的代码以避免对operator+operator-的不必要调用之后,我似乎获得了更好的扩展性。

Yes there can be a slowdown.是的,可能会放缓。 Main memory (RAM) bandwidth is limited, and if you have multiple cores reading a lot of data quickly you may saturate the memory bus.主要 memory (RAM) 带宽有限,如果您有多个内核快速读取大量数据,您可能会使 memory 总线饱和。 The maximum memory bandwidth is typically tens of gigabytes per second (see the page for your specific processor, eg i9-9900K which shows 41.6 GB/s). memory 的最大带宽通常为每秒数十 GB(请参阅特定处理器的页面,例如显示 41.6 GB/s 的i9-9900K )。

As well, all cores on one physical package share a single L3 cache, so if you are reading some data more than once you may have fewer cache hits as your threads push each others' data out of L3 (which is the largest cache).同样,一个物理 package 上的所有内核共享一个 L3 缓存,因此如果您多次读取某些数据,您的缓存命中可能会更少,因为您的线程将彼此的数据推出 L3(这是最大的缓存)。

If you want to know how much slowdown there is from certain configurations, you have only one choice: test them .如果您想知道某些配置有多少减速,您只有一个选择:测试它们 Consider adding prefetch instructions to your code if you know ahead of time what memory you are likely to need, especially if your access pattern is non-sequential.如果您提前知道您可能需要什么 memory,请考虑在您的代码中添加预取指令,尤其是在您的访问模式是非顺序的情况下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM