传递给std :: sort时，全局函数比functor或lambda慢

Question

我做了一个小测试来检查全局函数/ functor / lambda的性能作为std::sort函数的比较器参数。 Functor和lambda具有相同的性能。 我惊讶地发现，看起来最简单的回调的全局函数要慢得多。

#include <stdafx.h>
#include <windows.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
using namespace std;

const int vector_size = 100000;

bool CompareFunction(const string& s1, const string& s2) 
{ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

struct CompareFunctor 
{
    bool operator() (const string& s1, const string& s2) 
    { 
        return s1[0] < s2[0]; 
    }
} compareFunctor;

int main()
{
    srand ((unsigned int)time(NULL));
    vector<string> v(vector_size);

    for(size_t i = 0; i < vector_size; ++i)
    {
        ostringstream s;
        s << rand();
        v[i] = s.str().c_str();
    }

    LARGE_INTEGER freq;
    LARGE_INTEGER beginTime, endTime;
    QueryPerformanceFrequency(&freq);
    QueryPerformanceCounter(&beginTime);

    // One of three following lines should be uncommented
    sort(v.begin(), v.end(), CompareFunction);
    // sort(v.begin(), v.end(), compareFunctor);
    // sort(v.begin(), v.end(), [](const string& s1, const string& s2){return s1[0] < s2[0];});

    QueryPerformanceCounter(&endTime);
    float f = (endTime.QuadPart - beginTime.QuadPart) *  1000.0f/freq.QuadPart;      // time in ms
    cout << f << endl;

    return 0;
}

一些Windows特定的代码用于精确的执行时间测量。 环境：Windows 7，Visual C ++ 2010.当然，打开默认优化的发布配置。 执行时间处理时间：

Global function 2.6 - 3.6 ms   (???)
Functor - 1.7 - 2.4 ms
Lambda - 1.7 - 2.4 ms

那么，为什么全局函数更慢？ VC ++编译器有什么问题，还是别的什么？

Answer 1

lambda和functor版本有效地消除了每次比较的推理和弹出参数。

尝试使用

inline bool CompareFunction(const string& s1, const string& s2) 
{ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

并看看它是否有所作为。 请注意，编译器的自动内联会因编译器，构建版本等而有很大差异。我会惊讶于编译器不会自动内联您的全局函数 - 除非您实际上是在调试模式下编译 - 您不应该做一个性能测试用例。 要真正测试内联是否是问题，您应该将测试分成两个文件并单独编译

更换

bool CompareFunction(const string& s1, const string& s2){ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

同

bool CompareFunction(const string& s1, const string& s2);

并将定义放在一个单独的文件中 - 比如compare.cpp

当你在它的时候，你可以通过使用以下内容来挫败函子的内联：

struct CompareFunctor 
{
    bool operator() (const string& s1, const string& s2);
} compareFunctor;

并放入一个单独的文件

bool CompareFunctor::operator() (const string& s1, const string& s2)
{ 
    return s1[0] < s2[0]; 
}

Answer 2

传递全局函数是最复杂的，而不是最简单的。

当你传入一个函数时，你实际上是在传递一个指向函数的指针，因此sort函数不能轻易地内联对函数的调用，因为它在编译时不知道指针指向的是什么。 当然，它可能能够通过函数指针调用每次调用相同的函数并将其全部内联，但这很难。

当您使用lambda或functor时，编译器确切地知道在生成代码时需要调用哪个函数，因此它很可能能够将其全部内联。

Answer 3

你应该调用几千次以获得更精确的结果。

这有多快取决于编译器的智能。 它可能内联一些操作（很可能是lambdas，可能是functor，非内联全局变量）。 此外，如果比较是否内联将取决于其复杂性; 结果会有所不同。

我强烈建议不要看这些详细的“优化”。 编程的时间远远超过运行时获得的（非常小的）增益。 专注于编写干净，易懂，简单的代码。 下周试图理解“为最终速度感到沮丧”的代码会让你过早地秃顶。

传递给std :: sort时，全局函数比functor或lambda慢

问题描述

3 个解决方案

解决方案1
2 2014-01-29 15:57:29

解决方案2
2 2014-01-29 16:25:25

解决方案3
0 2014-01-29 12:55:49

传递给std :: sort时，全局函数比functor或lambda慢

问题描述

3 个解决方案

解决方案1 2 2014-01-29 15:57:29

解决方案2 2 2014-01-29 16:25:25

解决方案3 0 2014-01-29 12:55:49

解决方案1
2 2014-01-29 15:57:29

解决方案2
2 2014-01-29 16:25:25

解决方案3
0 2014-01-29 12:55:49