简体   繁体   English

GCC优化?

[英]GCC optimization?

I'm using GCC 4.8.1. 我正在使用GCC 4.8.1。 I'm trying to benchmark the speed of some code by putting it into nested loops, as in the example below. 我正在尝试通过将其放入嵌套循环来对某些代码的速度进行基准测试,如下例所示。 Whenever I do so, it executes in the minimal amount of time (like .02 seconds), with -03 or without any optimizations, and regardless of how many iterations there are. 每当我这样做时,它都会在最短的时间内执行(例如.02秒),并且执行-03或不进行任何优化,无论执行多少次迭代。 Any reasons for this? 有什么原因吗? I'm sure it's working fine because the values are always correct and if I use printf within the loops, then it runs as expected. 我确信它工作正常,因为值始终正确,并且如果我在循环中使用printf ,那么它将按预期运行。

int main()
{
    int i,j,k;
    int var;
    int big_num = 1000000;
    int x[1];

    for (i = 0;i<big_num;++i){
        for (j = 0;j<big_num;++j){
            for (k = 0;k<big_num;++k){
               // any short code fragment such as:
               var = i - j + k; 
               x[0] = var;
            }
        }
    }
    return 0;
}

Not more true with your edited question: Your code is declaring a single-element array int x[1]; 对于您所编辑的问题,还不那么正确: 您的代码声明一个单元素数组 int x[1]; and is accessing it with an out of bounds index (the index should be less than 1 but non negative, so can only be 0) as x[1] ; 并使用 x[1]作为界外索引(索引应小于1但非负,因此只能为0)访问它; this is typical undefined behavior and the compiler can legally optimize it by emitting any kind of code. 这是典型的 未定义行为 ,编译器可以通过发出任何代码来合法地对其进行优化。

BTW, GCC 4.9 (on my Debian/Sid/x86-64) is (rightfully) optimizing your code to an empty main (since no useful computation happens) ; 顺便说一句, GCC 4.9 (在我的Debian / Sid / x86-64上) 正在 (正确地) 将您的代码优化为一个空的main (因为没有有用的计算发生); you can check this out by compiling with gcc -fverbose-asm -O2 -S and looking into the generated *.s assembly file ; 您可以通过使用gcc -fverbose-asm -O2 -S进行编译并查看生成的*.s汇编文件来进行检查; if you are really curious about the various internal representations during the optimization passes, compile with -fdump-tree-all ; 如果您真的对优化过程中的各种内部表示感到好奇,请使用-fdump-tree-all编译; you could also alter the compiler's behavior (or add some inspecting passes), eg by extending it with MELT 您还可以更改编译器的行为(或添加一些检查步骤),例如通过使用MELT扩展它

You could make your computation meaningful by replacing x[0] = var; 通过替换x[0] = var;可以使您的计算有意义x[0] = var; with x[0] += var; x[0] += var; and by ending your main with a side effect on x[0] , eg printf("%d\\n", x[0]); 并通过对x[0]产生副作用来结束main ,例如printf("%d\\n", x[0]); or return x[0] != 0; return x[0] != 0; . Then the compiler would probably generate a loop (it might compute the result of the loop at compile time, but I don't think that GCC is clever enough). 然后,编译器可能会生成一个循环(它可能会在编译时计算循环的结果,但我认为GCC不够聪明)。

At last, typical current microprocessors are often out-of-order & superscalar so execute more than one instruction per cycle (with a clock frequency of eg at least 2GHz). 最后,典型的当前微处理器通常是乱序的和超标量的,因此每个周期执行一个以上的指令(时钟频率至少为2GHz)。 So they are running several billions basic operations per second. 因此,它们每秒运行数十亿个基本操作。 You generally need a benchmark to last more than half a second (otherwise the measurement is not meaningful enough) and to repeat several times the benchmark. 通常,您需要一个基准持续时间超过半秒(否则,测量结果就没有足够的意义),并且需要重复几次基准。 So you need to code benchmarking code where several dozens billions (ie more than 10 10 ) elementary C operations are executed. 因此,您需要对基准测试代码进行编码,其中执行了数百亿个(即,超过10 10个 )基本C操作。 And you need that code to be useful (with side-effects or resulting computation used elsewhere), otherwise the compiler would optimize by removing it. 并且您需要该代码有用(带有副作用或在其他地方使用的结果计算),否则编译器将通过删除代码进行优化。 Also, benchmarking code should take some input (otherwise the compiler might do a lot of computation at compile-time). 同样,基准测试代码应该接受一些输入(否则,编译器可能会在编译时进行大量计算)。 In your case you might initialize bignum like 在您的情况下,您可以像这样初始化bignum

int main (int argc, char**argv) {
  int big_num = (argc>1)?atoi(argv[1]):1000000;

How do you set optimizations? 您如何设置优化? For me, it works (big_num = 1000): 对我来说,它有效(big_num = 1000):

$ gcc -o x -O0 x.c && time ./x
./x  2.08s user 0.00s system 99% cpu 2.086 total
$ gcc -o x -O1 x.c && time ./x 
./x  0.31s user 0.00s system 99% cpu 0.309 total
$ gcc -o x -O2 x.c && time ./x  
./x  0.00s user 0.00s system 0% cpu 0.000 total

Your code doesn't actually do anything. 您的代码实际上不执行任何操作。 GCC and most compilers are very smart. GCC和大多数编译器都很聪明。 It can look at that, determine it has no visible effects and remove it entirely. 它可以查看它,确定它没有可见的效果并将其完全删除。

Your code has no side effects : it doesn't send anything over network, not writing files, so gcc elides that code. 您的代码没有副作用 :它不会通过网络发送任何内容,也不会编写文件,因此gcc会删除该代码。 Modern gcc version have -fdump-* options that allow to log every phase of compiler: 现代的gcc版本具有-fdump-*选项,这些选项允许记录编译器的每个阶段:

$ gcc -O2 -fdump-tree-all elide.c

After that gcc will generate a bunch of output files: 之后,gcc将生成一堆输出文件:

$ ls -1 | head
a.out
elide.c
elide.c.001t.tu
elide.c.003t.original
elide.c.004t.gimple
elide.c.007t.omplower
...

Comparing them may reveal phase where code was removed. 比较它们可能会揭示删除代码的阶段。 In my case (GCC 4.8.1), it is cddce phase. 就我而言(GCC 4.8.1),它是cddce阶段。 From GCC source file gcc/tree-ssa-dce.c : 从GCC源文件gcc/tree-ssa-dce.c

/* Dead code elimination.

References:

    Building an Optimizing Compiler,
    Robert Morgan, Butterworth-Heinemann, 1998, Section 8.9.

    Advanced Compiler Design and Implementation,
    Steven Muchnick, Morgan Kaufmann, 1997, Section 18.10.

Dead-code elimination is the removal of statements which have no
impact on the program's output.  "Dead statements" have no impact
on the program's output, while "necessary statements" may have
impact on the output.

The algorithm consists of three phases:
1. Marking as necessary all statements known to be necessary,
    e.g. most function calls, writing a value to memory, etc;
2. Propagating necessary statements, e.g., the statements
    giving values to operands in necessary statements; and
3. Removing dead statements.  */

You may explicitly break optimizer by marking your variables as volatile : 您可以通过将变量标记为volatile来显式破坏优化器:

volatile int i,j,k;
volatile int var;
volatile int big_num = 1000000;
volatile int x[1];

volatile will tell compiler, that writing to memory cell has side effects volatile将告诉编译器,写入存储单元有副作用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM