简体   繁体   English

为什么我可以使用 XCode 的 llvm 与 MacPorts 的 clang++ 获得更快的二进制文件?

[英]Why do I get a faster binary with XCode's llvm vs. clang++ from MacPorts?

I have written a benchmark method to test my C++ program (which searches a game tree), and I am noticing that compiling with the "LLVM compiler 2.0" option in XCode 4.0.2 gives me a significantly faster binary than if I compile with the latest version of clang++ from MacPorts.我已经编写了一个基准方法来测试我的 C++ 程序(它搜索游戏树),并且我注意到使用 XCode 4.0.2 中的“LLVM 编译器 2.0”选项进行编译比使用来自 MacPorts 的最新版本的 clang++。

If I understand correctly I am using a clang front-end and llvm back-end in both cases.如果我理解正确,我在这两种情况下都使用 clang 前端和 llvm 后端。 Has Apple made improvements to their clang/llvm distribution to produce faster binaries for Mac OS? Apple 是否对他们的 clang/llvm 发行版进行了改进,以便为 Mac OS 生成更快的二进制文件? I can't find much information about the project.我找不到有关该项目的太多信息。

Here are the benchmarks my program produces for various compilers, all using -O3 optimization (higher is better):以下是我的程序为各种编译器生成的基准测试,全部使用 -O3 优化(越高越好):

(Xcode) "gcc 4.2": 38.7
(Xcode) "llvm gcc 4.2": 51.2
(Xcode) "llvm compiler 2.0": 50.6
g++-mp-4.6: 43.4
clang++: 40.6

Also, how do I compile with the clang/llvm XCode is using from the terminal?另外,如何使用终端使用的 clang/llvm XCode 进行编译? I can't find the command.我找不到命令。

EDIT: The scores I output are "thousands of games per second" which are calculated over a long enough run of the program.编辑:我 output 的分数是“每秒数千场比赛”,这是在足够长的程序运行期间计算的。 Scores are very consistent over multiple runs, and recent major algorithmic improvements have given me 1% - 5% speed ups, for example.多次运行的分数非常一致,例如,最近的主要算法改进给了我 1% - 5% 的加速。 A 25% speed up of 40 to 50 is huge for my program. 40 到 50 的 25% 加速对我的程序来说是巨大的。

UPDATE: I wasn't invoking clang++ from the command line with -flto.更新:我没有使用 -flto 从命令行调用 clang++。 Now when I compare clang++ -O3 -flto to /Developer/usr/bin/clang++ -O3 -flto from the command line the results are closer, but the Apple one is still 6.5% faster.现在,当我从命令行将 clang++ -O3 -flto 与 /Developer/usr/bin/clang++ -O3 -flto 进行比较时,结果更接近,但 Apple 的速度仍然快 6.5%。

Now how to enable link time optimization for gcc?现在如何为 gcc 启用链接时间优化? When I try g++ -flto I get the following error:当我尝试 g++ -flto 时,出现以下错误:

cc1plus: error: LTO support has not been enabled in this configuration

Apple LLVM Compiler should be available under /Developer/usr/bin/clang. Apple LLVM 编译器应该在 /Developer/usr/bin/clang 下可用。

I can't think of any particular reason why MacPorts clang++ would generate slower code... I would check whether you're passing in comparable command-line options.我想不出 MacPorts clang++ 会生成较慢代码的任何特殊原因......我会检查您是否传递了类似的命令行选项。 One thing that would make a large difference is if you're producing 32-bit code with one compiler, and 64-bit code with the other.会产生很大影响的一件事是,如果您使用一个编译器生成 32 位代码,而使用另一个编译器生成 64 位代码。

If GCC has no LTO then you need to build it yourself:如果 GCC 没有 LTO,那么您需要自己构建它:

http://solarianprogrammer.com/2012/07/21/compiling-gcc-4-7-1-mac-osx-lion/ http://solarianprogrammer.com/2012/07/21/compiling-gcc-4-7-1-mac-osx-lion/

For LTO you need to add 'libelf' to the instructions.对于 LTO,您需要在说明中添加“libelf”。

http://sourceforge.net/apps/trac/mingw-w64/wiki/LTO%20and%20GCC http://sourceforge.net/apps/trac/mingw-w64/wiki/LTO%20and%20GCC

Exact speed of an algorithm can depend on all kinds of things that are totally out of your's and the compiler's power.算法的确切速度可能取决于完全超出您和编译器能力的各种事物。 You may have a loop where the execution time depends on precisely how the instructions are aligned in memory, in a way that the compiler couldn't predict.您可能有一个循环,其中执行时间精确地取决于指令在 memory 中的对齐方式,编译器无法预测。 I have seen cases where a loop could enter different "states" with different execution times per iteration (so after a context switch, it could enter a state where it took either 12 or 13 cycles, rather randomly).我见过这样的情况,循环可以在每次迭代中以不同的执行时间进入不同的“状态”(因此在上下文切换之后,它可以进入一个 state,它需要 12 或 13 个周期,而不是随机的)。 This can all be coincidence.这一切都可能是巧合。

And you might be using different libraries, which is quite possible the reason.而且您可能正在使用不同的库,这很可能是原因。 In MacOS X, they are using a new and presumably faster implementation of std::string and std::vector, for example.例如,在 MacOS X 中,他们正在使用一种新的、可能更快的 std::string 和 std::vector 实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM