简体   繁体   English

全局变量会降低代码速度

[英]Global variables slow down code

I was messing around with the worst code I could write, (basicly trying to break things) and i noticed that this piece of code: 我正在搞乱我写的最糟糕的代码,(基本上是试图破解)我注意到这段代码:

for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
end
std::cout << x;

where N is a global variable runs significantly slower then: 其中N是一个全局变量,运行速度明显慢于:

int N = 10000;
for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
end
std::cout << x;

What happens with a global variable that makes it run slower? 全局变量会使它运行得慢,会发生什么?

tl;dr : Local version keeps N in a register, global version doesn't. tl; dr :本地版本将N保留在寄存器中,而全局版本则不然。 Declare constants with const and it'll be faster no matter how you declare it. 使用const声明常量,无论你如何声明,它都会更快。


Here's the example code I used: 这是我使用的示例代码:

#include <iostream>
#include <math.h>
void first(){
  int x=1;
  int N = 10000;
  for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
  std::cout << x;
}
int N=10000;
void second(){
  int x=1;
  for(int i = 0; i < N; ++i)
    tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
  std::cout << x;
}
int main(){
  first();
  second();
}

(named test.cpp ). (名为test.cpp )。

To look at the assembler code generated I ran g++ -S test.cpp . 为了查看生成的汇编代码,我运行了g++ -S test.cpp

I got a huge file but with some smart searching (I searched for tan), I found what I wanted: 我有一个巨大的文件,但有一些聪明的搜索(我搜索棕褐色),我找到了我想要的东西:

from the first function: 从第first功能:

Ltmp2:
    movl    $1, -4(%rbp)
    movl    $10000, -8(%rbp) ; N is here !!!
    movl    $0, -12(%rbp)    ;initial value of i is here
    jmp LBB1_2       ;goto the 'for' code logic
LBB1_1:             ;the loop is this segment
    movl    -4(%rbp), %eax
    cvtsi2sd    %eax, %xmm0
    movl    -4(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -4(%rbp)
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan        
    callq   _tan
    callq   _tan
    movl    -12(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -12(%rbp) 
LBB1_2:
    movl    -12(%rbp), %eax ;value of n kept in register 
    movl    -8(%rbp), %ecx  
    cmpl    %ecx, %eax  ;comparing N and i here
    jl  LBB1_1      ;if less, then go into loop code
    movl    -4(%rbp), %eax

second function: 第二功能:

Ltmp13:
    movl    $1, -4(%rbp)    ;i
    movl    $0, -8(%rbp) 
    jmp LBB5_2
LBB5_1:             ;loop is here
    movl    -4(%rbp), %eax
    cvtsi2sd    %eax, %xmm0
    movl    -4(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -4(%rbp)
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    callq   _tan
    movl    -8(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -8(%rbp)
LBB5_2:
    movl    _N(%rip), %eax  ;loading N from globals at every iteration, instead of keeping it in a register
    movl    -8(%rbp), %ecx

So from the assembler code you can see (or not) that , in the local version, N is kept in a register during the entire calculation, whereas in the global version, N is reread from the global at each iteration. 因此,从汇编代码中可以看到(或不是)在本地版本中,N在整个计算期间保存在寄存器中,而在全局版本中,N在每次迭代时从全局重新读取。

I imagine the main reason why this happens is for things like threading, the compiler can't be sure that N isn't modified. 我想这种情况发生的主要原因是线程之类的东西,编译器无法确定N是否未被修改。

if you add a const to the declaration of N ( const int N=10000 ), it'll be even faster than the local version though: 如果你在N的声明中添加一个constconst int N=10000 ),它会比本地版本更快:

    movl    -8(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -8(%rbp)
LBB5_2:
    movl    -8(%rbp), %eax
    cmpl    $9999, %eax ;9999 used instead of 10000 for some reason I do not know
    jle LBB5_1

N is replaced by a constant. N由常数代替。

无法优化全局版本以将其放入寄存器中。

I did a little experiment with the question and the answer of @rtpg, 我对@rtpg的问题和答案做了一点实验,

experimenting with the question 试验这个问题

In the file main1.h the global N variable 在main1.h文件中的全局N变量

int N = 10000;

Then in the main1.c file, 1000 computations of the situation: 然后在main1.c文件中,1000个计算的情况:

#include <stdio.h>
#include "sys/time.h"
#include "math.h"
#include "main1.h"



extern int N;

int main(){

        int k = 0;
        timeval static_start, static_stop;
        int x = 0;

        int y = 0;
        timeval start, stop;
        int M = 10000;

        while(k <= 1000){

                gettimeofday(&static_start, NULL);
                for (int i=0; i<N; ++i){
                        tan(tan(tan(tan(tan(tan(tan(tan(x++))))))));
                }
                gettimeofday(&static_stop, NULL);

                gettimeofday(&start, NULL);
                for (int j=0; j<M; ++j){
                        tan(tan(tan(tan(tan(tan(tan(tan(y++))))))));
                }
                gettimeofday(&stop, NULL);

                int first_interval = static_stop.tv_usec - static_start.tv_usec;
                int last_interval = stop.tv_usec - start.tv_usec;

                if(first_interval >=0 && last_interval >= 0){
                        printf("%d, %d\n", first_interval, last_interval);
                }

                k++;
        }

        return 0;
}

The results are shown in the follow histogram (frequency/microseconds) : 结果显示在以下直方图中(频率/微秒):

两种方法中比较输出时间的直方图 The red boxes are the non global variable based ended for loop (N), and the transparent green the M ended based for loop (non global). 红色框是基于循环(N)结束的非全局变量,而基于循环(非全局)的M结束的透明绿色。

There are evidence to suspect that the extern global varialbe is a little slow. 有证据表明外部全球变量有点慢。

experimenting with the answer The reason of @rtpg is very strong. 试验答案 @rtpg的原因很强大。 In this sense, a global variable could be slower. 从这个意义上讲,全局变量可能会更慢。

Speed of accessing local vs. global variables in gcc/g++ at different optimization levels 在不同优化级别以gcc / g ++访问本地变量和全局变量的速度

To test this premise i use a register global variable to test the performance. 为了测试这个前提,我使用寄存器全局变量来测试性能。 This was my main1.h with global variable 这是我的main1.h全局变量

int N asm ("myN") = 10000;

The new results histogram: 新结果直方图:

寄存器全局变量的结果

conclusion there are performance improve when the global variable is in register. 结论当全局变量在寄存器中时性能得到改善。 There is no a "global" or "local" variable problem. 没有“全局”或“本地”变量问题。 The performance depends on the access to the variable. 性能取决于对变量的访问。

I'm assuming the optimizer doesn't know the contents of the tan function when compiling the above code. 我假设在编译上面的代码时,优化器不知道tan函数的内容。

Ie, what tan does is unknown -- all it knows is to stuff stuff onto the stack, jump to some address, then clean up the stack afterwards. 也就是说, tan所做的事情是未知的 - 它只知道将东西塞进堆栈,跳转到某个地址,然后清理堆栈。

In the global variable case, the compiler doesn't know what tan does to N . 在全局变量的情况下,编译器不知道tanN In the local case, there are no "loose" pointers or references to N that tan could legitimately get at: so the compiler knows what values N will take. 在当地的情况下,没有任何“松动”的指针或引用Ntan可以合法获得的:使编译器知道什么样的价值观N将采取。

The compiler can flatten the loop -- anywhere from completely (one flat block of 10000 lines), partially (100 length loop, each with 100 lines), or not at all (length 10000 loop of 1 line each), or anything in between. 编译器可以平坦化循环 - 从完全(10000行的一个扁平块),部分(100个长度循环,每个100行),或者根本不变(每个1行的长度10000循环),或者介于两者之间的任何东西。

The compiler knows way more when your variables are local, because when they are global it has very little knowledge about how they change, or who reads them. 当变量是本地变量时,编译器会更多地了解它,因为当它们是全局变量时,它们对变化的方式或者读取变量的知识知之甚少。 So few assumptions can be made. 所以很少有假设。

Amusingly, this is also why it is hard for humans to reason about globals. 有趣的是,这也是人类很难推理全局的原因。

i think this may be a reason: Since Global variables are stored in heap memory,your code needs to access heap memory each time. 我认为这可能是一个原因:由于全局变量存储在堆内存中,因此每次代码都需要访问堆内存。 May be because of above reason code runs slow. 可能是因为上述原因代码运行缓慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM