[英]GCC 4.6.3 Linux -O3 enabled optimizations listing vs applied to code disparity. Is the order of optimization affecting the code compilation?
I'm facing a problem with GCC 4.6.3 which I can't find any logic solution/explanation. 我遇到了GCC 4.6.3的问题,我找不到任何逻辑解决方案/解释。 I'm working on a project of porting an embedded firmware application with OS to a Linux based application.
我正在开发一个将带有操作系统的嵌入式固件应用程序移植到基于Linux的应用程序的项目。 The application has a whole bunch of unit tests that can be activated via arguments to check the sanity of the code/features.
该应用程序有一大堆单元测试,可以通过参数激活,以检查代码/功能的完整性。
When I compile in debug, everything works 100% and all unit tests pass. 当我在调试中编译时,一切都工作100%并且所有单元测试都通过。 However, I had issues with release build (with -O3 optimizations).
但是,我遇到了发布版本的问题(使用-O3优化)。 I managed to isolate the problematic file.
我设法隔离了有问题的文件。 The file comes from a external package not codded by us and we do not want to change it at all.
该文件来自我们未编码的外部包,我们根本不想更改它。
I took GCC's documentation to get all the optimizations included in -O3. 我拿了GCC的文档来获得-O3中包含的所有优化。 Here is what I got:
这是我得到的:
-fauto-inc-dec
-fcprop-registers
-fdce
-fdefer-pop
-fdse
-fguess-branch-probability
-fif-conversion2
-fif-conversion
-finline-small-functions
-fipa-pure-const
-fipa-reference
-fmerge-constants
-fsplit-wide-types
-ftree-builtin-call-dce
-ftree-ccp
-ftree-ch
-ftree-copyrename
-ftree-dce
-ftree-dominator-opts
-ftree-dse
-ftree-fre
-ftree-sra
-ftree-ter
-funit-at-a-time
-fomit-frame-pointer
-fthread-jumps
-falign-functions
-falign-jumps
-falign-loops
-falign-labels
-fcaller-saves
-fcrossjumping
-fcse-follow-jumps
-fcse-skip-blocks
-fdelete-null-pointer-checks
-fexpensive-optimizations
-fgcse
-fgcse-lm
-findirect-inlining
-foptimize-sibling-calls
-fpeephole2
-fregmove
-freorder-blocks
-freorder-functions
-frerun-cse-after-loop
-fsched-interblock
-fsched-spec
-fschedule-insns
-fschedule-insns2
-fstrict-aliasing
-fstrict-overflow
-ftree-switch-conversion
-ftree-pre
-ftree-vrp
-finline-functions
-funswitch-loops
-fpredictive-commoning
-fgcse-after-reload
-ftree-vectorize
-fipa-cp-clone
I found out that it was -fschedule-insns
that was causing the problem. 我发现导致问题的是
-fschedule-insns
。 Removing this optimization got the code working fine again. 删除此优化使代码再次正常工作。
Here is what I can't explain, GCC's documentation says that if you want to know exactly what is GCC applying, you can write this in the console gcc -Q -O3 --help=optimizers | grep "enabled"
这是我无法解释的,GCC的文档说如果你想知道GCC应用的确切内容,你可以在控制台中写这个
gcc -Q -O3 --help=optimizers | grep "enabled"
gcc -Q -O3 --help=optimizers | grep "enabled"
. gcc -Q -O3 --help=optimizers | grep "enabled"
。 I did and here is the output: 我做了,这是输出:
-falign-functions [enabled]
-falign-jumps [enabled]
-falign-labels [enabled]
-falign-loops [enabled]
-fasynchronous-unwind-tables [enabled]
-fbranch-count-reg [enabled]
-fcaller-saves [enabled]
-fcombine-stack-adjustments [enabled]
-fcommon [enabled]
-fcompare-elim [enabled]
-fcprop-registers [enabled]
-fcrossjumping [enabled]
-fcse-follow-jumps [enabled]
-fdce [enabled]
-fdefer-pop [enabled]
-fdelete-null-pointer-checks [enabled]
-fdevirtualize [enabled]
-fdse [enabled]
-fearly-inlining [enabled]
-fexpensive-optimizations [enabled]
-fforward-propagate [enabled]
-fgcse [enabled]
-fgcse-after-reload [enabled]
-fgcse-lm [enabled]
-fguess-branch-probability [enabled]
-fif-conversion [enabled]
-fif-conversion2 [enabled]
-finline-functions [enabled]
-finline-functions-called-once [enabled]
-finline-small-functions [enabled]
-fipa-cp [enabled]
-fipa-cp-clone [enabled]
-fipa-profile [enabled]
-fipa-pure-const [enabled]
-fipa-reference [enabled]
-fipa-sra [enabled]
-fivopts [enabled]
-fjump-tables [enabled]
-fmath-errno [enabled]
-fmerge-constants [enabled]
-fmove-loop-invariants [enabled]
-foptimize-register-move [enabled]
-foptimize-sibling-calls [enabled]
-fpeephole [enabled]
-fpeephole2 [enabled]
-fpredictive-commoning [enabled]
-fprefetch-loop-arrays [enabled]
-fregmove [enabled]
-frename-registers [enabled]
-freorder-blocks [enabled]
-freorder-functions [enabled]
-frerun-cse-after-loop [enabled]
-frtti [enabled]
-fsched-critical-path-heuristic [enabled]
-fsched-dep-count-heuristic [enabled]
-fsched-group-heuristic [enabled]
-fsched-interblock [enabled]
-fsched-last-insn-heuristic [enabled]
-fsched-rank-heuristic [enabled]
-fsched-spec [enabled]
-fsched-spec-insn-heuristic [enabled]
-fsched-stalled-insns-dep [enabled]
-fschedule-insns2 [enabled]
-fshort-enums [enabled]
-fsigned-zeros [enabled]
-fsplit-ivs-in-unroller [enabled]
-fsplit-wide-types [enabled]
-fstrict-aliasing [enabled]
-fthread-jumps [enabled]
-fno-threadsafe-statics [enabled]
-ftoplevel-reorder [enabled]
-ftrapping-math [enabled]
-ftree-bit-ccp [enabled]
-ftree-builtin-call-dce [enabled]
-ftree-ccp [enabled]
-ftree-ch [enabled]
-ftree-copy-prop [enabled]
-ftree-copyrename [enabled]
-ftree-cselim [enabled]
-ftree-dce [enabled]
-ftree-dominator-opts [enabled]
-ftree-dse [enabled]
-ftree-forwprop [enabled]
-ftree-fre [enabled]
-ftree-loop-distribute-patterns [enabled]
-ftree-loop-if-convert [enabled]
-ftree-loop-im [enabled]
-ftree-loop-ivcanon [enabled]
-ftree-loop-optimize [enabled]
-ftree-phiprop [enabled]
-ftree-pre [enabled]
-ftree-pta [enabled]
-ftree-reassoc [enabled]
-ftree-scev-cprop [enabled]
-ftree-sink [enabled]
-ftree-slp-vectorize [enabled]
-ftree-sra [enabled]
-ftree-switch-conversion [enabled]
-ftree-ter [enabled]
-ftree-vect-loop-version [enabled]
-ftree-vectorize [enabled]
-ftree-vrp [enabled]
-funit-at-a-time [enabled]
-funswitch-loops [enabled]
-fvar-tracking [enabled]
-fvar-tracking-assignments [enabled]
-fvect-cost-model [enabled]
-fweb [enabled]
-fschedule-insns
is not in the list, it's marked as disabled if I remove the grep
. -fschedule-insns
不在列表中,如果我删除了grep
,它会被标记为禁用。 If I take all the optimizations listed by GCC's command output and compile the problematic file with the supplied list, the code still passes. 如果我采用GCC命令输出列出的所有优化并使用提供的列表编译有问题的文件,代码仍会通过。 What is wrong here?
这有什么不对?
Here is a wrap-up. 这是一个总结。 If I use -O3 directly, it fails.
如果我直接使用-O3,则会失败。 If I use all the optimizations of -O3 listed in GCC's documentation, it fails.
如果我使用GCC文档中列出的-O3的所有优化,它将失败。 If I used all the optimizations of -O3 provided by GCC from command line it passes.
如果我使用了GCC从命令行提供的-O3的所有优化,它就会通过。 Finally, if I use all the optimizations of -O3 listed in GCC's documentation excluding
-fschedule-insns
, it passes. 最后,如果我使用GCC文档中列出的-O3的所有优化,不包括
-fschedule-insns
,它会通过。
What is the true optimization listing of -O3 !?! 什么是-O3真正的优化列表!?! GCC's documentation or what GCC is telling via command line?
GCC的文档或GCC通过命令行告诉的内容? I'm confused and out of ideas on how I can get a positive/logical explanation to this.
我对如何得到积极/合理的解释感到困惑和想法。
Anybody faced this kind of issue with GCC? 有没有人在海湾合作委员会面临这样的问题?
Excellent question. 好问题。 You've just discovered that, as always, the source is the only truth .
你刚刚发现, 源头是唯一的事实 。 There is even a bug in GCC's Bugzilla for this.
GCC的Bugzilla甚至还有一个错误 。
I'll draw your attention to two places in the GCC source code. 我会把你的注意力集中在GCC源代码中的两个地方。
In gcc-4.6.3/gcc/opts.c
, line 474, we see within the table of default options the following: 在
gcc-4.6.3/gcc/opts.c
第474行中,我们在默认选项表中看到以下内容:
{ OPT_LEVELS_2_PLUS, OPT_frerun_cse_after_loop, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fcaller_saves, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fpeephole2, NULL, 1 }, #ifdef INSN_SCHEDULING /* Only run the pre-regalloc scheduling pass if optimizing for speed. */ { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 }, #endif { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fstrict_aliasing, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_freorder_blocks, NULL, 1 },
In gcc-4.6.3/gcc/config/i386/i386.c
, line 5166, we see 在
gcc-4.6.3/gcc/config/i386/i386.c
,第5166行,我们看到了
static const struct default_options ix86_option_optimization_table[] = { /* Turn off -fschedule-insns by default. It tends to make the problem with not enough registers even worse. */ #ifdef INSN_SCHEDULING { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 }, #endif #ifdef SUBTARGET_OPTIMIZATION_OPTIONS SUBTARGET_OPTIMIZATION_OPTIONS, #endif { OPT_LEVELS_NONE, 0, NULL, 0 } };
We may draw the conclusion that the documentation is only partially correct; 我们可以得出结论,文件只是部分正确; Some passes are actually disabled on some targets even at the
O
-level they'd normally be enabled at. 有些传递实际上在某些目标上被禁用,即使在通常启用它们的
O
level时也是如此。 In particular, the x86, mep and mcore-derived targets disable schedule-insns
at all optimization levels by default, even though it is supposed to be enabled at -O2
and up. 特别是,x86,mep和mcore派生的目标默认禁用所有优化级别的
schedule-insns
,即使它应该在-O2
及以上启用。 You can still force-enable it manually, but you run the risks for which it was disabled in the first place. 您仍然可以手动强制启用它,但是您首先要运行已禁用的风险。
Also, -fschedule_insns
may be disabled by default at all levels if the compiler was built with INSN_SCHEDULING
disabled. 此外,如果编译器是在禁用
INSN_SCHEDULING
的情况下构建的,则默认情况下可以在所有级别禁用-fschedule_insns
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.