简体繁体 English

多个OpenMP线程读取（不写入）共享变量的性能成本？

[英]Performance cost to multiple OpenMP threads reading (not writing) a shared variable?

原文 2017-08-06 21:19:02 7 2 c++/ multithreading/ performance/ openmp

In OpenMP (I am using C++), is there a performance cost if you have a shared (or even global) variable that is being repeatedly read (not written) by multiple threads? 在OpenMP（我使用的是C ++）中，如果您有一个共享（甚至全局）变量被多个线程重复读取（而不是写入），会不会降低性能？ I am aware that if they were writing to the variable, this would be incorrect. 我知道，如果他们正在写变量，这将是不正确的。 I am asking specifically about reading only - is there a potential performance cost if multiple threads are repeatedly reading the same variable? 我只问只读问题-如果多个线程重复读取同一个变量，是否会有潜在的性能成本？

2 个解决方案

If the variable (more precise memory location) is only read by all threads, you are basically fine both in terms of correctness and performance. 如果该变量（更精确的内存位置）仅由所有线程读取，则从正确性和性能方面来说，您基本上都可以。 Cache protocols have a "shared" state - so the value can be cached on multiple cores. 缓存协议具有“共享”状态-因此该值可以缓存在多个内核上。

However, you should also avoid to write data on the same cache line than the variable, as this would invalidate the cache for other cores. 但是，您还应该避免将数据写入变量所在的同一缓存行，因为这会使其他内核的缓存无效。 Also on a NUMA system you have to consider that it may be more expensive to read some memory regions for certain cores/threads. 同样在NUMA系统上，您还必须考虑读取某些内核/线程的某些内存区域可能更昂贵。

If you're only reading, then you have no safety issues. 如果您只是阅读，那么就没有安全问题。 Everything will work fine. 一切都会正常。 By definition, you don't have Race Conditions . 根据定义，您没有竞赛条件。 You don't need to do any locking, so no high-contention problems can happen. 您无需进行任何锁定，因此不会发生高争用问题。 You can test thread safety at run-time using the Clang ThreadSanitizer . 您可以使用Clang ThreadSanitizer在运行时测试线程安全性。

On the other hand, there are some performance issues to be aware about. 另一方面，还有一些性能问题需要注意。 Try to avoid false sharing by making every thread (or preferably all threads) access a bunch of data that's consecutive in memory at a time. 通过使每个线程（最好是所有线程）一次访问一堆在内存中连续的数据，来避免错误共享。 This way, when the CPU cache loads data, it'll not require to access memory multiple times every instant. 这样，当CPU缓存加载数据时，不需要每次都访问多次内存。 Accessing memory is considered very expensive (hundreds of times slower, at least) compared to accessing CPU cache. 与访问CPU缓存相比，访问内存被认为非常昂贵（至少要慢几百倍）。