简体   繁体   中英

GCC optimization?

I'm using GCC 4.8.1. I'm trying to benchmark the speed of some code by putting it into nested loops, as in the example below. Whenever I do so, it executes in the minimal amount of time (like .02 seconds), with -03 or without any optimizations, and regardless of how many iterations there are. Any reasons for this? I'm sure it's working fine because the values are always correct and if I use printf within the loops, then it runs as expected.

int main()
{
    int i,j,k;
    int var;
    int big_num = 1000000;
    int x[1];

    for (i = 0;i<big_num;++i){
        for (j = 0;j<big_num;++j){
            for (k = 0;k<big_num;++k){
               // any short code fragment such as:
               var = i - j + k; 
               x[0] = var;
            }
        }
    }
    return 0;
}

Not more true with your edited question: Your code is declaring a single-element array int x[1]; and is accessing it with an out of bounds index (the index should be less than 1 but non negative, so can only be 0) as x[1] ; this is typical undefined behavior and the compiler can legally optimize it by emitting any kind of code.

BTW, GCC 4.9 (on my Debian/Sid/x86-64) is (rightfully) optimizing your code to an empty main (since no useful computation happens) ; you can check this out by compiling with gcc -fverbose-asm -O2 -S and looking into the generated *.s assembly file ; if you are really curious about the various internal representations during the optimization passes, compile with -fdump-tree-all ; you could also alter the compiler's behavior (or add some inspecting passes), eg by extending it with MELT

You could make your computation meaningful by replacing x[0] = var; with x[0] += var; and by ending your main with a side effect on x[0] , eg printf("%d\\n", x[0]); or return x[0] != 0; . Then the compiler would probably generate a loop (it might compute the result of the loop at compile time, but I don't think that GCC is clever enough).

At last, typical current microprocessors are often out-of-order & superscalar so execute more than one instruction per cycle (with a clock frequency of eg at least 2GHz). So they are running several billions basic operations per second. You generally need a benchmark to last more than half a second (otherwise the measurement is not meaningful enough) and to repeat several times the benchmark. So you need to code benchmarking code where several dozens billions (ie more than 10 10 ) elementary C operations are executed. And you need that code to be useful (with side-effects or resulting computation used elsewhere), otherwise the compiler would optimize by removing it. Also, benchmarking code should take some input (otherwise the compiler might do a lot of computation at compile-time). In your case you might initialize bignum like

int main (int argc, char**argv) {
  int big_num = (argc>1)?atoi(argv[1]):1000000;

How do you set optimizations? For me, it works (big_num = 1000):

$ gcc -o x -O0 x.c && time ./x
./x  2.08s user 0.00s system 99% cpu 2.086 total
$ gcc -o x -O1 x.c && time ./x 
./x  0.31s user 0.00s system 99% cpu 0.309 total
$ gcc -o x -O2 x.c && time ./x  
./x  0.00s user 0.00s system 0% cpu 0.000 total

Your code doesn't actually do anything. GCC and most compilers are very smart. It can look at that, determine it has no visible effects and remove it entirely.

Your code has no side effects : it doesn't send anything over network, not writing files, so gcc elides that code. Modern gcc version have -fdump-* options that allow to log every phase of compiler:

$ gcc -O2 -fdump-tree-all elide.c

After that gcc will generate a bunch of output files:

$ ls -1 | head
a.out
elide.c
elide.c.001t.tu
elide.c.003t.original
elide.c.004t.gimple
elide.c.007t.omplower
...

Comparing them may reveal phase where code was removed. In my case (GCC 4.8.1), it is cddce phase. From GCC source file gcc/tree-ssa-dce.c :

/* Dead code elimination.

References:

    Building an Optimizing Compiler,
    Robert Morgan, Butterworth-Heinemann, 1998, Section 8.9.

    Advanced Compiler Design and Implementation,
    Steven Muchnick, Morgan Kaufmann, 1997, Section 18.10.

Dead-code elimination is the removal of statements which have no
impact on the program's output.  "Dead statements" have no impact
on the program's output, while "necessary statements" may have
impact on the output.

The algorithm consists of three phases:
1. Marking as necessary all statements known to be necessary,
    e.g. most function calls, writing a value to memory, etc;
2. Propagating necessary statements, e.g., the statements
    giving values to operands in necessary statements; and
3. Removing dead statements.  */

You may explicitly break optimizer by marking your variables as volatile :

volatile int i,j,k;
volatile int var;
volatile int big_num = 1000000;
volatile int x[1];

volatile will tell compiler, that writing to memory cell has side effects

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM