[英]Do C++ compilers optimize sequential static variable reading?
Do C++ compilers optimize sequential readings of the same static variable in a function scope, when this variable is accessed through a const ref? Do C++ compilers optimize sequential readings of the same static variable in a function scope, when this variable is accessed through a const ref? So CPU will read its value from its address in the static data just once and then will store the value in CPU cached memory and reuse it assuming that the value is immutable?
所以 CPU 只会从 static 数据中的地址读取它的值一次,然后将值存储在 CPU 缓存的 memory 中并假设该值是不可变的而重用它? In other words: are static variable implicitly declared as volatile , like there are other threads and the value may be magically changed?
换句话说: static 变量是否隐式声明为volatile ,就像还有其他线程一样,并且值可能会神奇地改变?
Because if CPU won't cache the value, sequential readings from the static variable address may hurt performance.因为如果 CPU 不缓存该值,从 static 变量地址连续读取可能会损害性能。 Is it better in this case to manually copy the value in a variable on stack so it will be in CPU cache?
在这种情况下,手动将值复制到堆栈上的变量中是否更好,这样它将在 CPU 缓存中?
class Singleton
{
// some code
Data data;
public:
static Singleton& instance()
{
static Singleton inst;
return inst;
}
}
int func(const Data& param);
int foo(int N)
{
int result = 0;
for (int i = 0; i < N; ++i)
{
// will compiler move the reading outside of the loop and the value will be cached by CPU?
const auto& data = Singleton::instance().data;
result += func(data);
}
return result;
}
If the compiler is able to prove beyond doubt the value will not change between different accesses, then according to the as-if rule it may consolidate multiple reads into one.如果编译器能够毫无疑问地证明不同访问之间的值不会改变,那么根据as-if 规则,它可以将多个读取合并为一个。
But proving it is often hard to do with a variable with a static storage duration, because there could be code in other translation units ( func()
in your example) that modifies it.但是证明它通常很难处理具有 static 存储持续时间的变量,因为其他翻译单元(示例中的
func()
)中可能存在修改它的代码。 So without seeing what func()
does, the compiler is forced to re-load data
in each iteration.因此,在没有看到
func()
做什么的情况下,编译器被迫在每次迭代中重新加载data
。 Similarly, when func()
itself is compiled, its argument needs to be re-loaded every time.同样,编译
func()
本身时,每次都需要重新加载它的参数。
Then there could also be other threads that modify data
.然后也可能有其他线程修改
data
。 According to the C++ memory model rules, those changes do not have to become visible until a sync event, which can be a memory fence, a mutex or an atomic acquire/release, etc. According to the C++ memory model rules, those changes do not have to become visible until a sync event, which can be a memory fence, a mutex or an atomic acquire/release, etc.
So yes, reducing the scope of a variable may often result in improved performance.所以是的,减少变量的 scope 通常可能会提高性能。 It's much easier to prove that a local variable can't be modified from "outside" than a global variable.
证明局部变量不能从“外部”修改比全局变量容易得多。
In the end, the best way to achieve good performance is by giving the compiler as complete a picture about the code being compiled as possible.最后,获得良好性能的最佳方法是为编译器提供尽可能完整的关于正在编译的代码的图片。 In the provided example,
func()
can be added to the same translation unit.在提供的示例中,可以将
func()
添加到同一翻译单元。 Then, if if it doesn't make other "unknown" calls and can be inlined, static analysis optimizer passes will eliminate all unnecessary duplicate reads.然后,如果它没有进行其他“未知”调用并且可以内联,则 static 分析优化器通过将消除所有不必要的重复读取。 For large applications, LTO (Link Time Optimization) is another way to improve performance, as it broadens the view for the optimizer.
对于大型应用程序,LTO(链接时间优化)是另一种提高性能的方法,因为它拓宽了优化器的视野。
C++ says objects are only volatile
if you mark them as such. C++ 说对象只有在你将它们标记为
volatile
时才会发生变化。 Nothing is magically marked volatile
for you.没有什么东西会被神奇地标记为
volatile
。 You can usually assume the compiler will do whatever's fastest while still meeting the requirements of the C++ specification.您通常可以假设编译器会在满足 C++ 规范要求的同时以最快的速度运行。 So it almost certainly caches the accesses, whenever that helps.
因此,只要有帮助,它几乎肯定会缓存访问。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.