简体   繁体   English

我应该使用哪些gcc优化标志?

[英]Which gcc optimization flags should I use?

If I want to minimize the time my c programs run, what optimization flags should I use (I want to keep it standard too) 如果我想最小化我的c程序运行的时间,我应该使用哪些优化标志(我也想使其保持标准)

Currently I'm using: 目前,我正在使用:

 -Wall -Wextra -pedantic -ansi -O3

Should I also use 我也应该使用

-std=c99

for example? 例如?

And is there I specific order I should put those flags on my makefile? 而且我有特定的命令要把那些标志放在我的makefile中吗? Does it make any difference? 有什么区别吗?

And also, is there any reason not to use all the optimization flags I can find? 而且,是否有任何理由不使用我能找到的所有优化标志? do they ever counter eachother or something like that? 他们曾经互相对抗过吗?

The flag -std=c99 does not change the optimization levels. 标志-std=c99不会更改优化级别。 It only changes what target language standard you want the compiler to confirm to. 它仅更改您希望编译器确认的目标语言标准。

You use -std=c99 when you want your program to be treated as a C99 program by the compiler. 当您希望程序被编译器视为C99程序时,可以使用-std=c99

The only flag that has to do with optimization among those you specified is -O3 . 在您指定的那些标记中,唯一与优化有关的标记是-O3 Others serve for other purposes. 其他人则用于其他目的。

You may want to add -funroll-loops and -fomit-frame-pointer , but they should be already included in -O3 . 您可能想要添加-funroll-loops-fomit-frame-pointer ,但是它们应该已经包含在-O3

I'd recommend compiling new code with -std=gnu11 , or -std=c11 if needed. 我建议使用-std=gnu11-std=c11编译新代码(如果需要)。 Silencing all -Wall warnings is usually a good idea, IIRC. 沉默所有- -Wall警告通常是一个好主意,IIRC。 -Wextra warns for some things you might not want to change. -Wextra警告某些您可能不想更改的事情。


A good way to check how something compiles is to look at the compiler asm output. 检查编译方式的一个好方法是查看编译器的asm输出。 http://gcc.godbolt.org/ formats the asm output nicely (stripping out the noise). http://gcc.godbolt.org/很好地格式化了asm输出(消除了噪音)。 Putting some key functions up there and looking at what different compiler versions do is useful if you understand asm at all. 如果您完全了解asm,则可以在其中放置一些关键功能并查看不同的编译器版本做什么。


Use a new compiler version. 使用新的编译器版本。 gcc and clang have both improved significantly in newer versions. 在较新的版本中,gcc和clang都得到了显着改进。 gcc 5.3 and clang 3.8 are the current releases. gcc 5.3和clang 3.8是当前版本。 gcc5 makes noticeably better code than gcc 4.9.3 in some cases. 在某些情况下,gcc5的代码明显优于gcc 4.9.3。


If you only need the binary to run on your own machine , you should use -O3 -march=native . 如果只需要二进制文件在自己的计算机上运行 ,则应使用-O3 -march=native

If you need the binary to run on other machines , choose the baseline for instruction-set extensions with stuff like -mssse3 -mpopcnt . 如果您需要二进制文件在其他计算机上运行 ,请为-mssse3 -mpopcnt类的指令集扩展选择基线。 You can use -mtune=haswell to optimize for Haswell even while making code that still runs on older CPUs (as determined by -march ). 即使使代码仍可在较旧的CPU上运行(由-march确定),您也可以使用-mtune=haswell为Haswell进行优化。


If your program doesn't depend on strict FP rounding behaviour, use -ffast-math . 如果您的程序不依赖严格的FP舍入行为,请使用-ffast-math If it does, you can usually still use -fno-math-errno and stuff like that, without enabling -funsafe-math-optimizations . 如果是这样,您通常仍可以使用-fno-math-errno东西,而无需启用-funsafe-math-optimizations Some FP code can get big speedups from fast-math, like auto-vectorization. 某些FP代码可以通过自动矢量化等快速方法获得较大的加速。


If you can usefully do a test-run of your program that exercises most of the code paths that need to be optimized for a real run, then use profile-directed optimization: 如果您可以有用地对程序进行测试运行,以运行需要针对实际运行进行优化的大多数代码路径,请使用配置文件导向的优化:

gcc  -fprofile-generate -Wall -Wextra -std=gnu11 -O3 -ffast-math -march=native -fwhole-program *.c -o my_program
./my_program -option1 < test_input1
./my_program -option2 < test_input2
gcc  -fprofile-use      -Wall -Wextra -std=gnu11 -O3 -ffast-math -march=native -fwhole-program *.c -o my_program

-fprofile-use enables -funroll-loops , since it has enough information to decide when to actually unroll. -fprofile-use启用-funroll-loops ,因为它具有足够的信息来决定何时实际展开。 Unrolling loops all over the place can make things worse. 到处展开循环会使情况变得更糟。 However, it's worth trying -funroll-loops to see if it helps. 但是,值得尝试-funroll-loops看看是否有帮助。

If your test runs don't cover all the code paths, then some important ones will be marked as "cold" and optimized less. 如果您的测试未涵盖所有代码路径,那么一些重要的代码将被标记为“冷”并且优化程度较低。


-O3 enables auto-vectorization, which -O2 doesn't. -O3启用自动矢量化, -O2则不启用。 This can give big speedups 这可以大大提高速度

-fwhole-program allows cross-file inlining, but only works when you put all the source files on one gcc command-line. -fwhole-program允许跨文件内联,但是仅当您将所有源文件放在一个gcc命令行上时才起作用。 -flto is another way to get the same effect. -flto是获得相同效果的另一种方法。 (Link-Time Optimization). (链接时间优化)。 clang supports -flto but not -fwhole-program . clang支持-flto但不支持-fwhole-program

-fomit-frame-pointer has been the default for a while now for x86-64, and more recently for x86 (32bit). 对于x86-64, -fomit-frame-pointer一直是默认设置,最近一次用于x86(32位)。


As well as gcc, try compiling your program with clang . 与gcc一样, 尝试使用clang编译程序 Clang sometimes makes better code than gcc, sometimes worse. Clang有时会比gcc产生更好的代码,有时会更糟。 Try both and benchmark. 同时尝试和基准测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM