C ++ 11 std :: function比虛擬調用慢嗎？

Question

我正在創建一種機制，允許用戶使用裝飾器模式從基本構建塊中形成任意復雜的函數。 這可以很好地實現功能，但我不喜歡它涉及大量虛擬調用的事實，特別是當嵌套深度變大時。 它讓我擔心，因為復雜的功能可能經常調用（> 100.000倍）。

為了避免這個問題，我嘗試將裝飾器方案轉換為std::function一旦完成（在SSCCE中為cfr.to_function to_function() ）。 在構造std::function期間連接所有內部函數調用。 我認為這比原始裝飾器方案更快評估，因為不需要在std::function版本中執行虛擬查找。

唉，基准測試證明我錯了：裝飾器方案實際上比我從它構建的std::function更快。 所以現在我想知道為什么。 也許我的測試設置有問題，因為我只使用兩個簡單的基本函數，這意味着可以緩存vtable查找？

我使用的代碼包含在下面，不幸的是它很長。

SSCCE

// sscce.cpp
#include <iostream>
#include <vector>
#include <memory>
#include <functional>
#include <random>

/**
 * Base class for Pipeline scheme (implemented via decorators)
 */
class Pipeline {
protected:
    std::unique_ptr<Pipeline> wrappee;
    Pipeline(std::unique_ptr<Pipeline> wrap)
    :wrappee(std::move(wrap)){}
    Pipeline():wrappee(nullptr){}

public:
    typedef std::function<double(double)> FnSig;
    double operator()(double input) const{
        if(wrappee.get()) input=wrappee->operator()(input);
        return process(input);
    }

    virtual double process(double input) const=0;
    virtual ~Pipeline(){}

    // Returns a std::function which contains the entire Pipeline stack.
    virtual FnSig to_function() const=0;
};

/**
 * CRTP for to_function().
 */
template <class Derived>
class Pipeline_CRTP : public Pipeline{
protected:
    Pipeline_CRTP(const Pipeline_CRTP<Derived> &o):Pipeline(o){}
    Pipeline_CRTP(std::unique_ptr<Pipeline> wrappee)
    :Pipeline(std::move(wrappee)){}
    Pipeline_CRTP():Pipeline(){};
public:
    typedef typename Pipeline::FnSig FnSig;

    FnSig to_function() const override{
        if(Pipeline::wrappee.get()!=nullptr){

            FnSig wrapfun = Pipeline::wrappee->to_function();
            FnSig processfun = std::bind(&Derived::process,
                static_cast<const Derived*>(this),
                std::placeholders::_1);
            FnSig fun = [=](double input){
                return processfun(wrapfun(input));
            };
            return std::move(fun);

        }else{

            FnSig processfun = std::bind(&Derived::process,
                static_cast<const Derived*>(this),
                std::placeholders::_1);
            FnSig fun = [=](double input){
                return processfun(input);
            };
            return std::move(fun);
        }

    }

    virtual ~Pipeline_CRTP(){}
};

/**
 * First concrete derived class: simple scaling.
 */
class Scale: public Pipeline_CRTP<Scale>{
private:
    double scale_;
public:
    Scale(std::unique_ptr<Pipeline> wrap, double scale) // todo move
:Pipeline_CRTP<Scale>(std::move(wrap)),scale_(scale){}
    Scale(double scale):Pipeline_CRTP<Scale>(),scale_(scale){}

    double process(double input) const override{
        return input*scale_;
    }
};

/**
 * Second concrete derived class: offset.
 */
class Offset: public Pipeline_CRTP<Offset>{
private:
    double offset_;
public:
    Offset(std::unique_ptr<Pipeline> wrap, double offset) // todo move
:Pipeline_CRTP<Offset>(std::move(wrap)),offset_(offset){}
    Offset(double offset):Pipeline_CRTP<Offset>(),offset_(offset){}

    double process(double input) const override{
        return input+offset_;
    }
};

int main(){

    // used to make a random function / arguments
    // to prevent gcc from being overly clever
    std::default_random_engine generator;
    auto randint = std::bind(std::uniform_int_distribution<int>(0,1),std::ref(generator));
    auto randdouble = std::bind(std::normal_distribution<double>(0.0,1.0),std::ref(generator));

    // make a complex Pipeline
    std::unique_ptr<Pipeline> pipe(new Scale(randdouble()));
    for(unsigned i=0;i<100;++i){
        if(randint()) pipe=std::move(std::unique_ptr<Pipeline>(new Scale(std::move(pipe),randdouble())));
        else pipe=std::move(std::unique_ptr<Pipeline>(new Offset(std::move(pipe),randdouble())));
    }

    // make a std::function from pipe
    Pipeline::FnSig fun(pipe->to_function());   

    double bla=0.0;
    for(unsigned i=0; i<100000; ++i){
#ifdef USE_FUNCTION
        // takes 110 ms on average
        bla+=fun(bla);
#else
        // takes 60 ms on average
        bla+=pipe->operator()(bla);
#endif
    }   
    std::cout << bla << std::endl;
}

基准

使用pipe ：

g++ -std=gnu++11 sscce.cpp -march=native -O3
sudo nice -3 /usr/bin/time ./a.out
-> 60 ms

使用fun ：

g++ -DUSE_FUNCTION -std=gnu++11 sscce.cpp -march=native -O3
sudo nice -3 /usr/bin/time ./a.out
-> 110 ms

Answer 1

你有std::function s綁定lambdas，調用std::function s綁定lamdbas，調用std::function s ......

看看你的to_function 。 它創建一個lambda，它調用兩個std::function ，並將該lambda綁定到另一個std::function 。 編譯器不會靜態解析任何這些。

所以最后，你會得到與虛函數解決方案一樣多的間接調用，如果你擺脫綁定的processfun並直接在lambda中調用它。 否則你有兩倍的數量。

如果你想要加速，你必須以一種可以靜態解析的方式創建整個管道，這意味着在你最終將類型擦除到單個std::function之前有更多的模板。

Answer 2

正如Sebastian Redl的回答所說，虛擬函數的“替代”通過動態綁定函數（虛擬或通過函數指針，取決於std::function實現）添加了幾層間接，然后它仍然調用虛擬Pipeline::process(double)功能無論如何！

通過刪除一層std::function間接並防止對Derived::process的調用是虛擬的，這種修改使得它顯着更快：

FnSig to_function() const override {
    FnSig fun;
    auto derived_this = static_cast<const Derived*>(this);
    if (Pipeline::wrappee) {
        FnSig wrapfun = Pipeline::wrappee->to_function();
        fun = [=](double input){
            return derived_this->Derived::process(wrapfun(input));
        };
    } else {
        fun = [=](double input){
            return derived_this->Derived::process(input);
        };
    }
    return fun;
}

這里還有比虛擬功能版本更多的工作。

Answer 3

std::function非常慢; 類型擦除和由此產生的分配在這方面發揮作用，同樣，使用gcc ，調用內聯/優化非常嚴重。 出於這個原因，人們試圖解決這個問題的過程中存在大量的C ++“代理人”。 我把一個移植到Code Review：

https://codereview.stackexchange.com/questions/14730/impossibly-fast-delegate-in-c11

但你可以在Google上找到很多其他人，或者自己編寫。

編輯：

這些天，請看這里快速代表。

Answer 4

std :: function的libstdc ++實現大致如下：

template<typename Signature>
struct Function
{
    Ptr functor;
    Ptr functor_manager;

    template<class Functor>
    Function(const Functor& f)
    {
        functor_manager = &FunctorManager<Functor>::manage;
        functor = new Functor(f);
    }

    Function(const Function& that)
    {
        functor = functor_manager(CLONE, that->functor);
    }

    R operator()(args) // Signature
    {
        return functor_manager(INVOKE, functor, args);
    }

    ~Function()
    {
        functor_manager(DESTROY, functor);
    }
}

template<class Functor>
struct FunctorManager
{
     static manage(int operation, Functor& f)
     {
         switch (operation)
         {
         case CLONE: call Functor copy constructor;
         case INVOKE: call Functor::operator();
         case DESTROY: call Functor destructor;
         }
     }
}

因此雖然std::function不知道Functor對象的確切類型，但是它通過functor_manager函數指針調度重要的操作，該指針是一個知道Functor類型的模板實例的靜態函數。

每個std::function實例將在堆上分配其自己擁有的仿函數對象副本（除非它不大於指針，例如函數指針，在這種情況下它只是將指針保存為子對象）。

重要的是，如果底層仿函數對象具有昂貴的復制構造函數和/或占用大量空間（例如保存綁定參數），則復制std::function很昂貴。

C ++ 11 std :: function比虛擬調用慢嗎？

問題描述

SSCCE

基准

4 個解決方案

解決方案1
23 2013-09-04 09:07:29

解決方案2
18 已采納 2013-09-04 12:12:35

解決方案3
8 2013-09-04 09:36:10

解決方案4
6 2013-09-04 11:30:37

C ++ 11 std :: function比虛擬調用慢嗎？

問題描述

SSCCE

基准

4 個解決方案

解決方案1 23 2013-09-04 09:07:29

解決方案2 18 已采納 2013-09-04 12:12:35

解決方案3 8 2013-09-04 09:36:10

解決方案4 6 2013-09-04 11:30:37

解決方案1
23 2013-09-04 09:07:29

解決方案2
18 已采納 2013-09-04 12:12:35

解決方案3
8 2013-09-04 09:36:10

解決方案4
6 2013-09-04 11:30:37