简体   繁体   中英

c++ : Can the compiler optimize this code segment?

void foo(const int constant)
{
    for(int i = 0; i < 1000000; i++) {
        // do stuff
        if(constant < 10) {              // Condition is tested million times :(
            // inner loop stuff
        }
    }
}

For every execution of the outer loop the value of "constant" is checked. However, constant never changes so a lot of CPU time is being wasted to test the condition constant < 10? over and over again. A human would realize after the first few passes that constant never changes, and intelligently avoid checking it over and over again. Does the compiler notice this and intelligently optimize it, or is the repeated if loop unavoidable?

Personally, I think the problem is unavoidable. Even if the compiler put the comparison before the outer loop and set some kind of boolean variable "skip_inner_stuff" this variable would still have to be checked for each pass of the outer for loop.

What are your thoughts on the matter? Is there a more efficient way to write the above code segment that would avoid the problem?

The optimization you describe is also called loop unswitching . It has been a standard part of optimizing compilers for many years - but if you want to make sure your compiler performs it, compile your sample code with some optimization level (eg -O2 in gcc) and check the generated code.

However, in cases where the compiler cannot prove that a piece of code is invariant throughout the loop - eg a call to an external function which is not available at compile time - then indeed, manually hoisting the code to be outside the loop can net a very big performance boost.

Compiler can optimize the code but you couldn't expect it does a magic tricks on your code.

The optimization strongly depends on your code and the usage of your code. For example if you use foo like this:

foo(12345);

Compiler can optimize the code very much. Even it can compute the result in compile time.

But if you use it like this:

int k;
cin >> k;
foo(k);

In this case it can not get rid of inner if (the value is provided in run-time).

I wrote a sample code with MinGW/GCC-4.8.0:

void foo(const int constant)
{
    int x = 0;
    for (int i = 0; i < 1000000; i++)
    {
        x++;
        if (constant < 10)
        {
            x--;
        }
    }
    cout << x << endl;
}

int main()
{
    int k;
    cin >> k;
    foo(k);
}

Let's see the generate assembly code:

004015E1  MOV EAX,0F4240                 // i = 1000000
004015E6  MOV EBP,ESP
004015E8  XOR EDX,EDX
004015EA  PUSH ESI
004015EB  PUSH EBX
004015EC  SUB ESP,10
004015EF  MOV EBX,DWORD PTR SS:[EBP+8]
004015F2  XOR ECX,ECX                    // set ECX to 0
004015F4  CMP EBX,0A                     // if constant < 10
          ^^^^^^^^^^
004015F7  SETGE CL                       // then set ECX to 1
004015FA  ADD EDX,ECX                    // add ECX to i
004015FC  SUB EAX,1                      // i--
004015FF  JNE SHORT 004015F2             // loop if i is not zero

As you can see the inner if exists in the code. See CMP EBX,0A .

I repeat again it strongly depends on the lines with loops.

Others have covered the relevant compiler optimizations: loop unswitching which moves the test outside the loop and provides two separate loop bodies; and code inlining that will in some cases provide the compiler with the actual value of constant so that it can remove the test, and either execute 'inner loop stuff' unconditionally or remove it entirely.

Also be aware that quite aside from anything the compiler does, modern CPU designs actually do something similar to "A human would realize after the first few passes that constant never changes". It's called dynamic branch prediction .

The key point is that checking an integer is incredibly cheap, and even taking a branch can be very cheap. What's potentially expensive is mis-predicted branches. Modern CPUs use various strategies to guess which way a branch will go, but all of those strategies will quickly start correctly predicting a branch that goes the same way a million times in a row.

What I don't know, is whether modern CPUs are smart enough to spot that constant is a loop invariant and do the full loop unswitching in microcode. But assuming correct branch prediction, the loop unswitch is probably a minor optimization anyway. The more specific the processor family targeted by the compiler, the more it knows about the quality of its branch predictor, and the more likely it is that the compiler can determine whether the additional benefit of loop unswitching is worth the code bloat.

Of course there are still minimal CPUs, where the compiler has to provide all the cleverness. The CPU in your PC is not one of them.

you could optimise it by hand:

void foo(const int constant)
{
    if (constant < 10) {
        for(int i = 0; i < 1000000; i++) {
            // do stuff

           // inner loop stuff here
        }
    } else {
        for(int i = 0; i < 1000000; i++) {
            // do stuff

            // NO inner loop stuff here
        }
    }
}

I don't know whether most compilers would do something like this, but it doesn't seem like too much of a stretch.

A good compiler might optimize it.

Compilers optimize based on cost analysis. A good compiler should thus estimate the cost of each alternative (with and without hoisting) and pick whichever is cheaper.

It means that if the code in the inner part is big, it might not be worth optimizing because this could lead to instruction cache trashing. On the other hand, if it is cheap, then it can be hoisted.

If it shows up in the profiler because it has not been optimized, the compiler messed up.

A good compiler will optimize that (when optimizations are enabled).

If using GCC you could

  • compile with optimization and assembly code generation with

     gcc -Wall -O2 -fverbose-asm -S source.c 

    then look (with some editor, or a pager like less ) into the generated assembly code source.s

  • ask GCC to dump a lot (hundreds!) of intermediate files and look inside the intermediate gimple representation in it

     gcc -Wall -O2 -fdump-tree-all -c source.c 
  • use MELT and its probe to look interactively inside the gimple.

Take the habit of always asking all warnings with -Wall from gcc (or g++ if compiling C++ code.

BTW, in practice, such an optimization ( "loop invariant code hoisting" as the other answer explains) is essential, because such kind of intermediate code happens very often, eg after function inlining.... (imagine several calls to foo been inlined...)

实际上所有现代编译器都进行优化,如果您认为编译器不应该进行此优化,则应遵循此优化,您应该将变量设置为“volatile”。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM