简体   繁体   English

为什么将影响 lambda 的代码编译为 std::function 如此缓慢,尤其是使用 Clang?

[英]Why is compiling a code affecting lambda to std::function so slow, in particular with Clang?

I discovered that compile time of a relatively small amount of code, converting lambda functions to std::function<> values, can be very high, in particular with Clang compiler.我发现将 lambda 函数转换为std::function<>值的相对少量代码的编译时间可能非常长,尤其是使用 Clang 编译器时。

Consider the following dummy code that creates 100 lambda functions:考虑以下创建 100 个 lambda 函数的虚拟代码:

#if MODE==1
#include <functional>
using LambdaType = std::function<int()>;
#elif MODE==2
using LambdaType = int(*)();
#elif MODE==3
#include "function.h" // https://github.com/skarupke/std_function
using LambdaType = func::function<int()>;
#endif

static int total=0;

void add(LambdaType lambda)
{
    total += lambda();
}

int main(int argc, const char* argv[])
{
    add([]{ return 1; });
    add([]{ return 2; });
    add([]{ return 3; });
    // 96 more such lines...
    add([]{ return 100; });

    return total == 5050 ? 0 : 1;
}

Depending on MODE preprocessor macro, that code can select between the following three ways to pass by a lambda function to add function:根据MODE预处理器宏,该代码可以在以下三种方式之间进行选择,以通过 lambda 函数传递以add函数:

  1. std::function<> template class std::function<>模板类
  2. a simple C pointer to function (possible here only because there is no capture)一个简单的 C 函数指针(这里可能只是因为没有捕获)
  3. a fast replacement to std::function written by Malte Skarupke ( https://probablydance.com/2013/01/13/a-faster-implementation-of-stdfunction/ )快速替代由 Malte Skarupke 编写的std::function ( https://probablydance.com/2013/01/13/a-faster-implementation-of-stdfunction/ )

Whatever the mode, the program always exit with a regular 0 error code.无论哪种模式,程序总是以常规的0错误代码退出。 But now look at compilation time with Clang:但是现在看看 Clang 的编译时间:

$ time clang++ -c -std=c++11 -DMODE=1 lambdas.cpp 
real    0m8.162s
user    0m7.828s
sys 0m0.318s

$ time clang++ -c -std=c++11 -DMODE=2 lambdas.cpp 
real    0m0.109s
user    0m0.056s
sys 0m0.046s

$ time clang++ -c -std=c++11 -DMODE=3 lambdas.cpp 
real    0m0.870s
user    0m0.814s
sys 0m0.051s

$ clang++ --version
Apple LLVM version 10.0.0 (clang-1000.11.45.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Whow.哇。 There is a 80 times compile time difference between std::function and pointer to function modes ! std::function和指向函数模式的指针之间有 80 倍的编译时间差异! And even a 10 times difference between std::function and its replacement.甚至std::function和它的替代品之间有 10 倍的差异。

How can it be?怎么会这样? Is there a performance problem specific to Clang or is it due to the inherent complexity of std::function requirement?是否存在 Clang 特有的性能问题,还是由于std::function要求的固有复杂性?

I tried to compile the same code with GCC 5.4 and Visual Studio 2015. There are also big compile time differences, but not as much.我尝试使用 GCC 5.4 和 Visual Studio 2015 编译相同的代码。编译时间也有很大差异,但没有那么多。

GCC :海湾合作委员会

$ time g++ -c -std=c++11 -DMODE=1 lambdas.cpp 
real    0m1.179s
user    0m1.080s
sys 0m0.092s

$ time g++ -c -std=c++11 -DMODE=2 lambdas.cpp 
real    0m0.136s
user    0m0.120s
sys 0m0.012s

$ time g++ -c -std=c++11 -DMODE=3 lambdas.cpp 
real    0m1.994s
user    0m1.792s
sys 0m0.196s

Visual Studio :视觉工作室

C:\>ptime cl /c /DMODE=1 /EHsc /nologo lambdas.cpp
Execution time: 2.411 s

C:\>ptime cl /c /DMODE=2 /EHsc /nologo lambdas.cpp
Execution time: 0.270 s

C:\>ptime cl /c /DMODE=3 /EHsc /nologo lambdas.cpp
Execution time: 1.122 s

I am now considering using Malte Skarupke's implementation, both for a slight better runtime performance and for a big compile time enhancement.我现在正在考虑使用 Malte Skarupke 的实现,以提高运行时性能和大幅增强编译时间。

Have a look at what the compiler has to process in each case with the --save-temps option.使用 --save-temps 选项查看编译器在每种情况下必须处理的内容。 On my machine with clang 6.0.1, MODE=1 generates a 575K preprocessed file, due to the multitude of standard library headers being included.在我的机器上使用 clang 6.0.1,MODE=1 生成一个 575K 的预处理文件,因为包含了大量的标准库头文件。 The MODE=1 generates a 416 byte file, 1000 times smaller. MODE=1 生成一个 416字节的文件,小 1000 倍。 The generated assembly is also different by a factor of 10.生成的程序集也相差 10 倍。

I don't have the ability to test and interpret the example you have, however, from Clang 9.0.0 on, it has the ability to make a time trace of your compilation.我没有能力测试和解释您拥有的示例,但是,从 Clang 9.0.0 开始,它能够对您的编译进行时间跟踪。 See phoronix article for an impression and links to more info.有关印象和更多信息的链接,请参阅phoronix 文章 In short, you can get a json of what it's doing that you can visualize in a nice graphic by adding -ftime-trace to the command line.简而言之,您可以通过将-ftime-trace添加到命令行来获得它正在做什么的 json,您可以在一个漂亮的图形中可视化。

If you notice something really strange, you can always log a bug at bugs.llvm.org with a good reproduction (I think changing some wording of this question would be fine)如果您发现一些非常奇怪的事情,您可以随时在 bugs.llvm.org 上记录一个错误并重现(我认为更改此问题的一些措辞会很好)

Let me also add a small comment about the testing code.让我也添加一个关于测试代码的小注释。 I'm not surprised that the std:: function is slower to compile, as this requires an extra include to parse.我对std:: function编译速度较慢并不感到惊讶,因为这需要额外的包含来解析。 (And standard library includes are huge) Also for the run-time, the slow effect is logical as std:: function is adding a lot of extra indirection which can't be optimized away. (并且标准库包含很大)同样对于运行时,缓慢的效果是合乎逻辑的,因为std:: function添加了许多无法优化的额外间接。

I would recommend adding a 4th year case where add is a template and the function type the template argument:我建议添加一个第 4 年的案例,其中 add 是一个模板,函数输入模板参数:

template<typename LambdaType>
void add(LambdaType &&lambda)
{
    total += lambda();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM