访问静态函数变量比访问全局变量要慢吗？

Question

Static local variables are initialised on the first function call: 静态局部变量在第一个函数调用时初始化：

Variables declared at block scope with the specifier static have static storage duration but are initialized the first time control passes through their declaration (unless their initialization is zero- or constant-initialization, which can be performed before the block is first entered). 在块作用域中使用指定符static声明的变量具有静态存储持续时间，但是在控件第一次通过其声明时初始化（除非它们的初始化为零或初始化初始化，这可以在首次输入块之前执行）。 On all further calls, the declaration is skipped. 在所有进一步的调用中，将跳过声明。

Also, in C++11 there are even more checks: 此外，在C ++ 11中还有更多检查：

If multiple threads attempt to initialize the same static local variable concurrently, the initialization occurs exactly once (similar behavior can be obtained for arbitrary functions with std::call_once). 如果多个线程同时尝试初始化相同的静态局部变量，则初始化只发生一次（使用std :: call_once可以获得任意函数的类似行为）。 Note: usual implementations of this feature use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison. 注意：此功能的常规实现使用双重检查锁定模式的变体，这可以将已初始化的局部静态的运行时开销减少到单个非原子布尔比较。 (since C++11) （自C ++ 11以来）

At the same time, global variables seem to be initialised on program start (though technically only allocation / deallocation is mentioned on cppreference): 同时，全局变量似乎在程序启动时初始化（尽管技术上只在cppreference上提到了分配 / 解除分配 ）：

static storage duration. 静态存储时间。 The storage for the object is allocated when the program begins and deallocated when the program ends. 程序开始时分配对象的存储空间，程序结束时分配存储空间。 Only one instance of the object exists. 只存在一个对象实例。 All objects declared at namespace scope (including global namespace) have this storage duration, plus those declared with static or extern. 在命名空间范围（包括全局命名空间）声明的所有对象都具有此存储持续时间，以及使用static或extern声明的持续时间

So given the following example: 所以给出以下示例：

struct A {
    // complex type...
};
const A& f()
{
    static A local{};
    return local;
}

A global{};
const A& g()
{
    return global;
}

am I correct to assume that f() has to check whether its variable was initialised every time it is called and thus f() will be slower than g() ? 我是否正确假设f()必须检查每次调用它的变量是否被初始化，因此f()将比g()慢？

Answer 1

You are conceptually correct of course, but contemporary architectures can deal with this. 当然，你在概念上是正确的，但现代建筑可以解决这个问题。

A modern compiler and architecture would arrange the pipeline such that the already-initialised branch was assumed. 现代编译器和体系结构将安排管道，以便假定已经初始化的分支。 The overhead of initialisation would therefore incur an extra pipeline dump, that's all. 因此，初始化的开销会产生额外的管道转储，这就是全部。

If you're in any doubt, check the assembly. 如果您有任何疑问，请检查组件。

Answer 2

Yes, it is almost certainly slightly slower. 是的，它几乎肯定会稍微慢一些。 Most of the time it will however not matter and the cost will be outweighted by the "logic and style" benefit. 然而，大部分时间它都无关紧要，成本将超过“逻辑和风格”的好处。

Technically, a function-local static variable is the same as a global variable. 从技术上讲，函数本地静态变量与全局变量相同。 Only just that its name is not globally known (which is a good thing), and its initialization is guaranteed to happen not only at an exactly specified time, but also only once, and threadsafe. 只是它的名称不是全局已知的（这是一件好事），并且它的初始化保证不仅发生在确切的指定时间，而且只发生一次，并且线程安全。

This means that a function-local static variable must know whether initialization has happened, and thus needs at least one extra memory access and one conditional jump that the global (in principle) doesn't need. 这意味着函数本地静态变量必须知道初始化是否已经发生，因此需要至少一个额外的内存访问和一个全局（原则上）不需要的条件跳转。 An implemenation may do someting similar for globals, but it needs not (and usually doesn't). 实现可能会对全局变量做类似的事情，但它不需要 （通常也不需要）。

Chances are good that the jump is predicted correctly in all cases but two. 在所有情况下都可以正确预测跳跃，但有两个跳跃是很好的。 The first two calls are highly likely to be predicted wrong (usually jumps are by default assumed to be taken rather than not, wrong assumption on first call, and subsequent jumps are assumed to take the same path as the last one, again wrong). 前两个调用很可能被预测为错误（通常默认假设是跳过，而不是第一次调用时的错误假设，并且假设后续跳转采用与最后一个相同的路径，同样错误）。 After that, you should be good to go, near 100% correct prediction. 在那之后，你应该好好去，接近100％正确的预测。
But even a correctly predicted jump isn't free (the CPU can still only start a given number of instructions every cycle, even assuming they take zero time to complete), but it's not much. 但即使是正确预测的跳转也不是免费的（CPU仍然只能在每个周期启动给定数量的指令，即使假设它们没有时间完成），但它并不多。 If the memory latency, which may be a couple of hundred cycles in the worst case can be successfully hidden, the cost almost disappears in pipelining. 如果可以成功隐藏在最坏情况下可能是几百个周期的存储器延迟，则流水线中的成本几乎消失。 Also, every access fetches an extra cacheline that wouldn't otherwise be needed (the has-been-initialized flag likely isn't stored in the same cache line as the data). 此外，每次访问都会获取一个额外的高速缓存行，否则不需要该高速缓存行（已经初始化的标志可能不会存储在与数据相同的高速缓存行中）。 Thus, you have slightly worse L1 performance (L2 should be big enough so you can say "yeah, so what"). 因此，你的L1性能稍差（L2应该足够大，所以你可以说“是的，那么什么”）。

It also needs to actually perform something once and threadsafe that the global (in principle) doesn't have to do, at least not in a way that you see. 它还需要实际执行一次并且线程安全全局（原则上）不必执行，至少不是以您看到的方式执行。 An implementation can do something different, but most just initialize globals before main is entered, and not rarely most of it is done with a memset or implicitly because the variable is stored in a segment that is zeroed anyway. 实现可以做一些不同的事情，但大多数只是在输入main之前初始化全局变量，并且很少大部分是使用memset完成的，或者是隐式的，因为变量存储在无论如何都归零的段中。
Your static variable must be initialized when the initialization code is executed, and it must happen in a threadsafe manner. 执行初始化代码时，必须初始化静态变量，并且必须以线程安全的方式进行。 Depending on how much your implementation sucks this can be quite expensive. 根据您的实施情况糟透了，这可能非常昂贵。 I decided to forfeit on the thread safety feature and always compile with fno-threadsafe-statics (even if this isn't standard-compliant) after discovering that GCC (which is otherwise an OK allround compiler) would actually lock a mutex for every static initialization. 我决定放弃线程安全功能，并且在发现GCC（否则是一个OK allround编译器）实际上会锁定每个静态的互斥锁后，总是用fno-threadsafe-statics编译（即使这不符合标准）初始化。

Answer 3

From https://en.cppreference.com/w/cpp/language/initialization 来自https://en.cppreference.com/w/cpp/language/initialization

Deferred dynamic initialization 延迟动态初始化
It is implementation-defined whether dynamic initialization happens-before the first statement of the main function (for statics) or the initial function of the thread (for thread-locals), or deferred to happen after. 它是实现定义的，是否在主函数的第一个语句（用于静态）或线程的初始函数（用于线程本地）之前发生动态初始化，或者延迟发生在之后。

If the initialization of a non-inline variable (since C++17) is deferred to happen after the first statement of main/thread function, it happens before the first odr-use of any variable with static/thread storage duration defined in the same translation unit as the variable to be initialized. 如果非内联变量（因为C ++ 17）的初始化延迟发生在主/线程函数的第一个语句之后，它发生在任何变量的第一次使用之前，其中静态/线程存储持续时间定义在与要初始化的变量相同的翻译单元。

So similar check may have to be done also for global variables. 因此，对于全局变量也可能需要进行类似的检查。

so f() is not necessary "slower" than g() . 所以f()不必比g() “慢” 。

Answer 4

g() is not thread-safe, and is susceptible to all sorts of ordering problems. g()不是线程安全的，并且容易受到各种排序问题的影响。 Safety is going to come at a price. 安全将付出代价。 There are several ways to pay it: 有几种支付方式：

f() , the Meyer's Singleton, pays the price on every access. f() ，Meyer的Singleton，为每次访问付出代价。 If accessed frequently or accessed during a performance-sensitive section of your code, then it does make sense to avoid f() . 如果频繁访问或在代码的性能敏感部分访问，那么避免使用f()确实有意义。 Your processor presumably has a finite number of circuits it can devote to branch prediction, and you are being forced to read an atomic variable before the branch anyway. 您的处理器可能具有可用于分支预测的有限数量的电路，并且您无论如何都被迫在分支之前读取原子变量。 It is a tall price to continually pay for just ensuring that the initialization happened only once. 仅仅确保初始化只发生一次，这是一个很高的代价。

h() , described below, works very much like g() with an extra indirection, but assumes that h_init() gets called exactly once at the beginning of execution. h() ，如下所述，与g()非常相似，具有额外的间接性，但假设h_init()在执行开始时只被调用一次。 Preferably, you would define a subroutine that gets called as the line of main() ; 最好你定义一个被调用为main()行的子程序; that calls every function like h_init() , with an absolute ordering. 它使用绝对排序调用像h_init()这样的每个函数。 Hopefully, these objects do not need to be destructed. 希望这些物体不需要被破坏。

Alternatively, if you use GCC, you can annotate h_init() with __attribute__((constructor)) . 或者，如果使用GCC，则可以使用__attribute__((constructor))注释h_init() __attribute__((constructor)) 。 I prefer the explicitness of the static init subroutine though. 我更喜欢静态init子例程的显式性。

A * h_global = nullptr;
void h_init() { h_global = new A { }; }
A const& h() { return *h_global; }

h2() is just like h() , minus the extra indirection: h2()就像h() ，减去额外的间接：

alignas(alignof(A)) char h2_global [sizeof(A)] = { };
void h2_init() { new (std::begin(h2_global)) A { }; }
A const& h2() { return * reinterpret_cast <A const *> (std::cbegin(h2_global)); }

访问静态函数变量比访问全局变量要慢吗？

问题描述

4 个解决方案

解决方案1
15 已采纳 2018-09-06 07:14:16

解决方案2
6 2018-09-06 10:28:01

解决方案3
2 2018-09-06 09:45:56

解决方案4
0 2018-09-12 16:04:12

访问静态函数变量比访问全局变量要慢吗？

问题描述

4 个解决方案

解决方案1 15 已采纳 2018-09-06 07:14:16

解决方案2 6 2018-09-06 10:28:01

解决方案3 2 2018-09-06 09:45:56

解决方案4 0 2018-09-12 16:04:12

解决方案1
15 已采纳 2018-09-06 07:14:16

解决方案2
6 2018-09-06 10:28:01

解决方案3
2 2018-09-06 09:45:56

解决方案4
0 2018-09-12 16:04:12