简体   繁体   中英

Compiler optimizations: Where/how can I get a feel for what the payoff is for different optimizations?

In my independent study of various compiler books and web sites, I am learning about many different ways that a compiler can optimize the code that is being compiled, but I am having trouble figuring out how much of a benefit each optimization will tend to give.

How do most compiler writers go about deciding which optimizations to implement first? Or which optimizations are worth the effort or not worth the effort? I realize that this will vary between types of code and even individual programs, but I'm hoping that there is enough similarity between most programs to say, for instance, that one given technique will usually give you a better performance gain than another technique.

I found when implementing textbook compiler optimizations that some of them tended to reverse the improvements made by other optimizations. This entailed a lot of work trying to find the right balance between them.

So there really isn't a good answer to your question. Everything is a tradeoff. Many optimizations work well on one type of code, but are pessimizations for other types. It's like designing a house - if you make the kitchen bigger, the pantry gets smaller.

The real work in building an optimizer is trying out the various combinations, benchmarking the results, and, like a master chef, picking the right mix of ingredients.

Tongue in cheek:

  1. Hubris
  2. Benchmarks
  3. Embarrassment

More seriously, it depends on your compiler's architecture and goals. Here's one person's experience...

Go for the "big payoffs":

  • native code generation
  • register allocation
  • instruction scheduling

Go for the remaining "low hanging fruit":

  • strength reduction
  • constant propagation
  • copy propagation

Keep bennchmarking.

Look at the output; fix anything that looks stupid.

It is usually the case that combining optimizations, or even repeating optimization passes, is more effective than you might expect. The benefit is more than the sum of the parts.

You may find that introduction of one optimization may necessitate another. For example, SSA with Briggs-Chaitin register allocation really benefits from copy propagation.

Historically, there are "algorithmical" optimizations from which the code should benefit in most of the cases, like loop unrolling (and compiler writers should implement those "documented" and "tested" optimizations first).

Then there are types of optimizations that could benefit from the type of processor used (like using SIMD instructions on modern CPUs).

See Compiler Optimizations on Wikipedia for a reference.

Finally, various type of optimizations could be tested profiling the code or doing accurate timing of repeated executions.

I'm not a compiler writer, but why not just incrementally optimize portions of your code, profiling all the while?

My optimization scheme usually goes:

1) make sure the program is working

2) find something to optimize

3) optimize it

4) compare the test results with what came out from 1; if they are different, then the optimization is actually a breaking change.

5) compare the timing difference

Incrementally, I'll get it faster.

I choose which portions to focus on by using a profiler. I'm not sure what extra information you'll garner by asking the compiler writers.

This really depends on what you are compiling. There is was a reasonably good discussion about this on the LLVM mailing list recently, it is of course somewhat specific to the optimizers they have available. They use abbreviations for a lot of their optimization passes, if you not familiar with any of acronyms they are tossing around you can look at their passes page for documentation. Ultimately you can spend years reading academic papers on this subject.

This is one of those topics where academic papers (ACM perhaps?) may be one of the better sources of up-to-date information. The best thing to do if you really want to know could be to create some code in unoptimized form and some in the form that the optimization would take (loops unrolled, etc) and actually figure out where the gains are likely to be using a compiler with optimizations turned off.

It is worth noting that in many cases, compiler writers will NOT spend much time, if any, on ensuring that their libraries are optimized. Benchmarks tend to de-emphasize or even ignore library differences, presumably because you can just use different libraries. For example, the permutation algorithms in GCC are asymptotically* less efficient than they could be when trying to permute complex data. This relates to incorrectly making deep copies during calls to swap functions. This will likely be corrected in most compilers with the introduction of rvalue references (part of the C++0x standard). Rewriting the STL to be much faster is surprisingly easy.

*This assumes the size of the class being permuted is variable. Eg permutting a vector of vectors of ints would slow down if the vectors of ints were larger.

One that can give big speedups but is rarely done is to insert memory prefetch instructions. The trick is to figure out what memory the program will be wanting far enough in advance, never ask for the wrong memory and never overflow the D-cache.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM