传递给std :: sort时，全局函数比functor或lambda慢

Question

I made a small test to check the performance of global function/functor/lambda as comparator parameters for std::sort function. 我做了一个小测试来检查全局函数/ functor / lambda的性能作为std::sort函数的比较器参数。 Functor and lambda give the same performance. Functor和lambda具有相同的性能。 I was surprised to see, that global function, which appears to be the simplest callback, is much slower. 我惊讶地发现，看起来最简单的回调的全局函数要慢得多。

#include <stdafx.h>
#include <windows.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
using namespace std;

const int vector_size = 100000;

bool CompareFunction(const string& s1, const string& s2) 
{ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

struct CompareFunctor 
{
    bool operator() (const string& s1, const string& s2) 
    { 
        return s1[0] < s2[0]; 
    }
} compareFunctor;

int main()
{
    srand ((unsigned int)time(NULL));
    vector<string> v(vector_size);

    for(size_t i = 0; i < vector_size; ++i)
    {
        ostringstream s;
        s << rand();
        v[i] = s.str().c_str();
    }

    LARGE_INTEGER freq;
    LARGE_INTEGER beginTime, endTime;
    QueryPerformanceFrequency(&freq);
    QueryPerformanceCounter(&beginTime);

    // One of three following lines should be uncommented
    sort(v.begin(), v.end(), CompareFunction);
    // sort(v.begin(), v.end(), compareFunctor);
    // sort(v.begin(), v.end(), [](const string& s1, const string& s2){return s1[0] < s2[0];});

    QueryPerformanceCounter(&endTime);
    float f = (endTime.QuadPart - beginTime.QuadPart) *  1000.0f/freq.QuadPart;      // time in ms
    cout << f << endl;

    return 0;
}

A bit of Windows-specific code is used for precise execution time measurement. 一些Windows特定的代码用于精确的执行时间测量。 Environment: Windows 7, Visual C++ 2010. Of course, Release configuration with default optimizations turned on. 环境：Windows 7，Visual C ++ 2010.当然，打开默认优化的发布配置。 Execution time: 执行时间处理时间：

Global function 2.6 - 3.6 ms   (???)
Functor - 1.7 - 2.4 ms
Lambda - 1.7 - 2.4 ms

So, why the global function is slower? 那么，为什么全局函数更慢？ Some problem with VC++ compiler, or something else? VC ++编译器有什么问题，还是别的什么？

Answer 1

the lambda and functor versions are in lined effectively eliminating the pushing and popping of arguments for every compare. lambda和functor版本有效地消除了每次比较的推理和弹出参数。

Try using 尝试使用

inline bool CompareFunction(const string& s1, const string& s2) 
{ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

and see if it makes a difference. 并看看它是否有所作为。 Note that automatic inlining by compilers will vary a lot depending on the compiler, build version etc. I would be surprised that the compiler doesn't automatically inline your global function - unless you're actually compiling in debug mode - which you shouldn't be doing for a performance test case. 请注意，编译器的自动内联会因编译器，构建版本等而有很大差异。我会惊讶于编译器不会自动内联您的全局函数 - 除非您实际上是在调试模式下编译 - 您不应该做一个性能测试用例。 To really test whether inlining is the issue, you should divide your test into two files and compile them separately 要真正测试内联是否是问题，您应该将测试分成两个文件并单独编译

replace 更换

bool CompareFunction(const string& s1, const string& s2){ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

with 同

bool CompareFunction(const string& s1, const string& s2);

and put the definition in a separate file - say compare.cpp 并将定义放在一个单独的文件中 - 比如compare.cpp

While you're at it, you could frustrate inlining for functors as well by using: 当你在它的时候，你可以通过使用以下内容来挫败函子的内联：

struct CompareFunctor 
{
    bool operator() (const string& s1, const string& s2);
} compareFunctor;

and putting in a separate file 并放入一个单独的文件

bool CompareFunctor::operator() (const string& s1, const string& s2)
{ 
    return s1[0] < s2[0]; 
}

Answer 2

Passing a global function is the most complex, not the simplest. 传递全局函数是最复杂的，而不是最简单的。

When you pass in a function you are in fact passing in a pointer to the function so the sort function can't easily inline the call to the function as it doesn't know at compile time what the pointer will point to. 当你传入一个函数时，你实际上是在传递一个指向函数的指针，因此sort函数不能轻易地内联对函数的调用，因为它在编译时不知道指针指向的是什么。 Sure, it may be able to figure out that the call through the function pointer calls the same function every time and inline it all, but that's difficult. 当然，它可能能够通过函数指针调用每次调用相同的函数并将其全部内联，但这很难。

When you use a lambda or functor, the compiler knows exactly which function it needs to call when it is generating the code so it is very much likely to be able to inline it all. 当您使用lambda或functor时，编译器确切地知道在生成代码时需要调用哪个函数，因此它很可能能够将其全部内联。

Answer 3

You should call the sort a few thousand times to get more precise results. 你应该调用几千次以获得更精确的结果。

How fast this goes depends on the compiler's smarts. 这有多快取决于编译器的智能。 It might inline some operations (lambdas very probably, functors probably, non-inline globals unlikely). 它可能内联一些操作（很可能是lambdas，可能是functor，非内联全局变量）。 Also, if the comparison is inlined or not will depend on its complexity; 此外，如果比较是否内联将取决于其复杂性; and the results will differ. 结果会有所不同。

I'd strongly advise against looking at such detailed "optimizations." 我强烈建议不要看这些详细的“优化”。 Your time programming is much more expensive than the (very small) gain you'll get in run time. 编程的时间远远超过运行时获得的（非常小的）增益。 Concentrate on writing clean, understandable, simple code. 专注于编写干净，易懂，简单的代码。 Trying to understand "bummed for ultimate speed" code next week will just get you to go prematurely bald. 下周试图理解“为最终速度感到沮丧”的代码会让你过早地秃顶。

传递给std :: sort时，全局函数比functor或lambda慢

问题描述

3 个解决方案

解决方案1
2 2014-01-29 15:57:29

解决方案2
2 2014-01-29 16:25:25

解决方案3
0 2014-01-29 12:55:49

传递给std :: sort时，全局函数比functor或lambda慢

问题描述

3 个解决方案

解决方案1 2 2014-01-29 15:57:29

解决方案2 2 2014-01-29 16:25:25

解决方案3 0 2014-01-29 12:55:49

解决方案1
2 2014-01-29 15:57:29

解决方案2
2 2014-01-29 16:25:25

解决方案3
0 2014-01-29 12:55:49