Compiling with all -O3 flags manually has different result than literally specifying “-O3”

Question

I've got a very peculiar problem that I would like to get to the bottom of. According to http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html , the optimizaiton -O3 is just a set of optimization flags. So I tried compiling a program with with g++ program.c -o program -fauto-inc-dec -fcompare-elim -... where I listed out all the optimizations in -O3 manually. Then I tried just g++ program.c -o program -O3 and discovered that the latter binary was way faster. This means that the manual optimization is not equivalent. Any idea why this is happening? We observed this behavior with multiple programs and even with -O1 and -O2 .

Answer 1

The Optimize Options page isn't necessarily complete, I've found. You can find what seems to be the exact set of options GCC applied to a particular program by using the flags -S -fverbose-asm , and examining the .s file the compiler generates.

For example, on a program I compiled locally, GCC reported it used the following flags:

# GNU C (GCC) version 4.8.0 (x86_64-unknown-linux-gnu)
#   compiled by GNU C version 4.8.0, GMP version 4.3.2, MPFR version 3.0.0-p3, MPC version 0.8.2
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -I . -I .. -I /usr/include/SDL -D linux -D _GNU_SOURCE=1
# -D _REENTRANT -D JZINTV_VERSION_MAJOR=0x01 -D JZINTV_VERSION_MINOR=0x00
# gfx/gfx_scale.c -msse -mtune=generic -march=x86-64
# -auxbase-strip gfx/gfx_scale.s -ggdb3 -O6 -Wall -Wextra -Wshadow
# -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wc++-compat
# -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -std=c99
# -fverbose-asm -flto -fomit-frame-pointer -fprefetch-loop-arrays
# options enabled:  -faggressive-loop-optimizations
# -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg
# -fcaller-saves -fcombine-stack-adjustments -fcommon -fcompare-elim
# -fcprop-registers -fcrossjumping -fcse-follow-jumps -fdefer-pop
# -fdelete-null-pointer-checks -fdevirtualize -fdwarf2-cfi-asm
# -fearly-inlining -feliminate-unused-debug-types -fexpensive-optimizations
# -fforward-propagate -ffunction-cse -fgcse -fgcse-after-reload -fgcse-lm
# -fgnu-runtime -fguess-branch-probability -fhoist-adjacent-loads -fident
# -fif-conversion -fif-conversion2 -findirect-inlining -finline
# -finline-atomics -finline-functions -finline-functions-called-once
# -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-profile
# -fipa-pure-const -fipa-reference -fipa-sra -fira-hoist-pressure
# -fira-share-save-slots -fira-share-spill-slots -fivopts
# -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-constants
# -fmerge-debug-strings -fmove-loop-invariants -fomit-frame-pointer
# -foptimize-register-move -foptimize-sibling-calls -foptimize-strlen
# -fpartial-inlining -fpeephole -fpeephole2 -fpredictive-commoning
# -fprefetch-loop-arrays -free -freg-struct-return -fregmove
# -freorder-blocks -freorder-functions -frerun-cse-after-loop
# -fsched-critical-path-heuristic -fsched-dep-count-heuristic
# -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic
# -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic
# -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fshrink-wrap
# -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types
# -fstrict-aliasing -fstrict-overflow -fstrict-volatile-bitfields
# -fsync-libcalls -fthread-jumps -ftoplevel-reorder -ftrapping-math
# -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch
# -ftree-coalesce-vars -ftree-copy-prop -ftree-copyrename -ftree-cselim
# -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre
# -ftree-loop-distribute-patterns -ftree-loop-if-convert -ftree-loop-im
# -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops=
# -ftree-partial-pre -ftree-phiprop -ftree-pre -ftree-pta -ftree-reassoc
# -ftree-scev-cprop -ftree-sink -ftree-slp-vectorize -ftree-slsr -ftree-sra
# -ftree-switch-conversion -ftree-tail-merge -ftree-ter
# -ftree-vect-loop-version -ftree-vectorize -ftree-vrp -funit-at-a-time
# -funswitch-loops -funwind-tables -fvar-tracking
# -fvar-tracking-assignments -fvect-cost-model -fverbose-asm
# -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387
# -maccumulate-outgoing-args -malign-stringops -mfancy-math-387
# -mfp-ret-in-387 -mglibc -mieee-fp -mlong-double-80 -mmmx -mno-sse4
# -mpush-args -mred-zone -msse -msse2 -mtls-direct-seg-refs

That's quite the haul of flags...

Compiling with all -O3 flags manually has different result than literally specifying “-O3”

Question

1 answers

solution1
8 ACCPTED 2013-12-07 01:32:35

Compiling with all -O3 flags manually has different result than literally specifying “-O3”

Question

1 answers

solution1 8 ACCPTED 2013-12-07 01:32:35

solution1
8 ACCPTED 2013-12-07 01:32:35