简体   繁体   English

为什么gcc -o0比icc -o0快?

[英]Why would gcc -o0 be faster than icc -o0?

For a brief report I have to do, our class ran code on a cluster using both gcc -O0 and icc -O0. 对于我要做的简短报告,我们的类在同时使用gcc -O0和icc -O0的集群上运行了代码。 We found that gcc was about 2.5 times faster than icc without any optimizations? 我们发现gcc在没有任何优化的情况下比icc快2.5倍? Why is this? 为什么是这样? Does gcc -O0 actually do some minor optimization or does it simply happen to work better for this system? gcc -O0实际上是做了一些小的优化,还是恰好对这个系统更好地工作?

The code was an implementation of the naive string searching algorithm found here , written in c. 该代码是用c编写的此处找到的幼稚字符串搜索算法的实现。

Thank you 谢谢

Performance at -O0 is not interesting or indicative of anything. 在-O0的性能并不有趣,也不表示什么。 It explicitly says "I don't care about performance", and the compiler takes you up on that; 它明确表示“我不在乎性能”,编译器会帮助您解决这一问题; it just does whatever happens to be simplest. 它只会做最简单的事情。 By random luck, what is simplest for GCC is faster than what is simplest for ICC for one highly specific microbenchmark on your specific hardware configuration. 幸运的是,对于您的特定硬件配置上的一个高度特定的微基准,对于GCC而言,最简单的方法比对ICC而言最简单的方法要快。 If you ran 100 other microbenchmarks, you would probably find some where ICC is faster, too. 如果您另外运行了100个微基准测试,您可能还会发现一些ICC速度更快的基准。 Even if you didn't, that still wouldn't mean much. 即使您没有,那也不会有太大的意义。 If you're going to compare performance across compilers, turn on optimizations, because that's what you do if you care about performance. 如果要比较各个编译器的性能,请启用优化功能,因为这是您关心性能时要执行的操作。

If you want to understand why one is faster, profile the execution. 如果您想了解为什么速度更快,请分析执行情况。 Where is the execution time being spent? 执行时间花在哪里? Where are there stalls? 哪里有摊位? Why do those stalls occur? 为什么发生这些失速?

A few things to take into account: 需要考虑的几件事:

  • The instruction set each compiler uses by default. 每个编译器默认使用的指令集。 For example if your GCC build produces i686 code by default, while ICC restricts itself to i586 opcodes, you would probably see a significant performance difference. 例如,如果您的GCC构建默认情况下生成i686代码,而ICC将自身限制为i586操作码,则可能会看到明显的性能差异。

  • The actual CPUs in your cluster. 集群中的实际CPU。 If you are using AMD processors, instead of Intel CPUs, then ICC is at a disadvantage because it is, of course, targeted specifically to Intel processors. 如果您使用AMD处理器而不是Intel CPU,那么ICC处于不利地位,因为ICC当然专门针对Intel处理器。

  • You mentioned using a cluster. 您提到使用集群。 Does this speed difference exist on a single processor as well? 这种速度差异是否也存在于单个处理器上? If you used any parallelisation facilities provided by your compiler, there could be significant differences there. 如果您使用了编译器提供的任何并行化工具,那么那里可能会有很大的不同。

  • Simplistically, when optimisations are disabled, the compiler uses pre-made "templates" for each code construct. 简单来说,禁用优化后,编译器将为每个代码构造使用预制的“模板”。 Since these templates are intended to be optimised afterwards, they are constructed in a way that enables the optimisation passes to produce better code. 由于这些模板打算在以后进行优化,因此它们的构造方式使优化过程可以生成更好的代码。 The fact that they may be slower or faster with -O0 does not really mean anything - for example, more explicit initial code could be easier to optimise but far slower to execute. 使用-O0可能会变慢或变快,这实际上并不意味着什么-例如,更明确的初始代码可能更易于优化,但执行起来却慢得多。

That said, the only way to find out what is going on is to profile the execution of your code and, if necessary, have a look at the assembly of those parts of the code where the major differences lie. 也就是说,找出正在发生的事情的唯一方法是分析代码的执行情况,并在必要时查看主要区别所在的那些代码部分的汇编。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM