將遞歸可變參數模板函數轉換為迭代

Question

說我有以下結構

#include <functional>

template <typename ...T>
struct Unpack;

// specialization case for float
template <typename ...Tail>
struct Unpack<float, Tail...>
{
    static void unpack(std::function<void(float, Tail...)> f, uint8_t *dataOffset)
    {
        float val;
        memcpy(&val, dataOffset, sizeof(float));

        auto g = [&](Tail&& ...args)
        {
            f(val, std::forward<Tail>(args)...);
        };

        Unpack<Tail...>::unpack(std::function<void(Tail...)>{g}, dataOffset + sizeof(float));
    }
};

// base recursive case
template <typename Head, typename ... Tail>
struct Unpack<Head, Tail...>
{
    static void unpack(std::function<void(Head, Tail...)> f, uint8_t *dataOffset)
    {
        Head val;
        memcpy(&val, dataOffset, sizeof(Head));

        auto g = [&](Tail&& ...args)
        {
            f(val, std::forward<Tail>(args)...);
        };

        Unpack<Tail...>::unpack(std::function<void(Tail...)>{g}, dataOffset + sizeof(Head));
    }
};

// end of recursion
template <>
struct Unpack<>
{
    static void unpack(std::function<void()> f, uint8_t *)
    {
        f(); // call the function
    }
};

它所做的全部工作就是使用一個std::function和一個字節數組，然后從字節數組中分離出塊，遞歸地將這些塊用作函數的參數，直到應用了所有參數，然后調用該函數。

我遇到的問題是，它生成了很多模板。 在調試模式下廣泛使用時，這尤其明顯-它導致二進制文件增長非常快。

給定以下用例

#include <iostream>
#include <string.h>

using namespace std;


void foo1(uint8_t a, int8_t b, uint16_t c, int16_t d, uint32_t e, int32_t f, uint64_t g, int64_t h, float i, double j)
{
    cout << a << "; " << b << "; " << c << "; " << d << "; " << e << "; " << f << "; " << g << "; " << h << "; " << i << "; " << j << endl;
}

void foo2(uint8_t a, int8_t b, uint16_t c, int16_t d, uint32_t e, int32_t f, int64_t g, uint64_t h, float i, double j)
{
    cout << a << "; " << b << "; " << c << "; " << d << "; " << e << "; " << f << "; " << g << "; " << h << "; " << i << "; " << j << endl;
}

int main()
{
    uint8_t *buff = new uint8_t[512];
    uint8_t *offset = buff;

    uint8_t a = 1;
    int8_t b = 2;
    uint16_t c = 3;
    int16_t d = 4;
    uint32_t e = 5;
    int32_t f = 6;
    uint64_t g = 7;
    int64_t h = 8;
    float i = 9.123456789;
    double j = 10.123456789;

    memcpy(offset, &a, sizeof(a));
    offset += sizeof(a);
    memcpy(offset, &b, sizeof(b));
    offset += sizeof(b);
    memcpy(offset, &c, sizeof(c));
    offset += sizeof(c);
    memcpy(offset, &d, sizeof(d));
    offset += sizeof(d);
    memcpy(offset, &e, sizeof(e));
    offset += sizeof(e);
    memcpy(offset, &f, sizeof(f));
    offset += sizeof(f);
    memcpy(offset, &g, sizeof(g));
    offset += sizeof(g);
    memcpy(offset, &h, sizeof(h));
    offset += sizeof(h);
    memcpy(offset, &i, sizeof(i));
    offset += sizeof(i);
    memcpy(offset, &j, sizeof(j));

    std::function<void (uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t, float, double)> ffoo1 = foo1;
    Unpack<uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t, float, double>::unpack(ffoo1, buff);

    // uint64_t and in64_t are switched
    //std::function<void (uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, int64_t, uint64_t, float, double)> ffoo2 = foo2;
    //Unpack<uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, int64_t, uint64_t, float, double>::unpack(ffoo2, buff);

    return 0;
}

通過注釋的兩行得到的調試二進制文件是264.4 KiB，但是當我取消注釋這兩行時，它變成了447.7 KiB，比原始行大70％。

與發布模式相同：37.5 KiB和59.0 KiB，比原始大小大60％。

用迭代替換遞歸是有意義的，就像應用於可變參數Unpack<...>:unpack()的初始化列表一樣，這樣C ++每種類型只會生成一個模板。

如果您想稍微玩一下，上面的代碼可以很好地編譯。

Answer 1

我用模板，索引序列和元組寫了一些瘋狂的東西，完全用range-v3的概念約束了，這很好。 然后我想到，如果將參數直接解壓縮到函數調用中，編譯器將更容易優化。 首先，我們創建一個類，可以從char*反序列化任何POD類型（可以輕松地轉換為普通復制）：

struct deserializer {
  const std::uint8_t* in_;

  deserializer(const std::uint8_t* in) : in_{in} {}

  template <typename T>
  operator T() {
    static_assert(std::is_pod<T>(), "");
    T t;
    std::memcpy(&t, in_, sizeof(T));
    in_ += sizeof(T);
    return t;
  }
};

然后您通常可以將unpack實施為：

template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  deserializer d{from};
  std::forward<F>(f)(static_cast<Ts>(d)...); // Oops, broken.
}

由於函數參數的順序未指定，因此它具有未指定的行為。 讓我們介紹一種將參數轉發給函數的類型，以便我們可以使用括號初始化來強制執行從左到右的求值：

struct forwarder {
  template <typename F, typename...Ts>
  forwarder(F&& f, Ts&&...ts) {
    std::forward<F>(f)(std::forward<Ts>(ts)...);
  }
};

// Requires explicit specification of argument types.
template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  deserializer d{from};
  forwarder{std::forward<F>(f), static_cast<Ts>(d)...};
}

並投入了兩個專門知識來從函數指針和std::function推斷出參數類型，因此我們不必總是指定它們：

// Deduce argument types from std::function
template <typename R, typename...Args>
void unpack(std::function<R(Args...)> f, const std::uint8_t* from) {
  unpack<Args...>(std::move(f), from);
}

// Deduce argument types from function pointer
template <typename R, typename...Args>
void unpack(R (*f)(Args...), const std::uint8_t* from) {
  unpack<Args...>(f, from);
}

所有這些都很好地暴露給了編譯器並且非常可優化。 單次調用和兩次調用版本之間的二進制大小變化很小（竊取了TC的框架）：

使用函數指針：-O0為〜2K，-O3為64B。

使用std::function ：-O0時〜3K，-O3時216B。

解壓縮和調用的代碼是幾十個匯編指令。 例如，在x64上的gcc 4.9.2使用-Os優化大小，這是顯式的專業化

template void unpack(decltype(foo1), const std::uint8_t*);

組裝到：

pushq   %rax
movq    %rsi, %rax
movswl  4(%rsi), %ecx
movzwl  2(%rsi), %edx
movq    %rdi, %r10
movsbl  1(%rsi), %esi
movzbl  (%rax), %edi
pushq   22(%rax)
pushq   14(%rax)
movl    10(%rax), %r9d
movl    6(%rax), %r8d
movsd   34(%rax), %xmm1
movss   30(%rax), %xmm0
call    *(%r10)
addq    $24, %rsp
ret

代碼大小足夠小，可以有效地內聯，因此生成的模板數量不是一個因素。

編輯：泛化到非PODs。

在deserializer器中包裝輸入迭代器，並使用轉換運算符執行實際的拆包是“聰明的”（使用“聰明”的正負含義），但它不可擴展。 客戶端代碼無法添加operator blahblah成員函數重載，而控制轉換運算符重載的唯一方法是使用SFINAE堆。 呸。 因此，讓我們放棄deserializer想法，並使用可擴展的調度機制。

首先，一個去除引用和cv限定符的元函數，以便當參數簽名為const std::vector<double>&時，例如，我們可以解包std::vector<double> ：

template <typename T>
using uncvref =
  typename std::remove_cv<
    typename std::remove_reference<T>::type
  >::type;

我是標簽分發的忠實擁護者，因此請設計一個可以容納任何類型的標簽包裝器：

template <typename T> struct arg_tag {};

然后我們可以使用一個通用的參數unpack函數來執行標簽分配：

template <typename T>
uncvref<T> unpack_arg(const std::uint8_t*& from) {
  return unpack_arg(arg_tag<uncvref<T>>{}, from);

多虧了參數依賴查找的魔力，只要在使用前聲明了unpack_arg重載，就可以找到在調度程序的定義之后聲明的重載。 即，調度系統很容易擴展。 我們將提供POD解包器：

template <typename T, typename std::enable_if<std::is_trivial<T>::value, int>::type = 0>
T unpack_arg(arg_tag<T>, const std::uint8_t*& from) {
  T t;
  std::memcpy(&t, from, sizeof(T));
  from += sizeof(T);
  return t;
}

從技術arg_tag ，它匹配任何 arg_tag ，但是如果匹配的類型很重要，則SFINAE將其從重載解析中刪除。 （是的，我知道我說POD之前，我改變了我的腦海里;瑣碎的類型是多了幾分一般不動。 memcpy -able）的前端，該調度機制並沒有改變多少：

struct forwarder {
  template <typename F, typename...Args>
  forwarder(F&& f, Args&&...args) {
    std::forward<F>(f)(std::forward<Args>(args)...);
  }
};

// Requires explicit specification of argument types.
template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  forwarder{std::forward<F>(f), unpack_arg<Ts>(from)...};
}

forwarder器保持不變， unpack<Types...>() API使用unpack_arg<Ts>(from)...代替static_cast<Ts>(d)...但顯然仍然具有相同的結構。 推導類型的重載：

template <typename R, typename...Args>
void unpack(std::function<R(Args...)> f, const std::uint8_t* from) {
  unpack<Args...>(std::move(f), from);
}

template <typename R, typename...Args>
void unpack(R (*f)(Args...), const std::uint8_t* from) {
  unpack<Args...>(f, from);
}

正常工作不變。 現在，我們可以通過為arg_tag<std::vector<T>>重載unpack_arg來提供解壓縮向量的擴展：

using vec_size_t = int;

template <typename T>
std::vector<T> unpack_arg(arg_tag<std::vector<T>>, const std::uint8_t*& from) {
  std::vector<T> vec;
  auto n = unpack_arg<vec_size_t>(from);
  vec.reserve(n);
  std::generate_n(std::back_inserter(vec), n, [&from]{
    return unpack_arg<T>(from);
  });
  return vec;
}

請注意向量解壓縮重載如何通過分派器解壓縮其組件： unpack_arg<vec_size_t>(from)表示大小，而unpack_arg<T>(from)表示每個元素。

再次編輯： `std::function<void()>`

現在的代碼有一個問題：如果f是std::function<void()>或void(*)(void) ，則從f推斷出參數類型的unpack重載將調用它們並無限遞歸。 最簡單的解決方法是命名該函數來完成實際的拆包工作-我將選擇unpack_explicit並使用各種unpack前端對其進行調用：

template <typename...Ts, typename F>
void unpack_explicit(F&& f, const std::uint8_t* from) {
  forwarder{std::forward<F>(f), unpack_arg<Ts>(from)...};
}

// Requires explicit specification of argument types.
template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  unpack_explicit<Ts...>(std::forward<F>(f), from);
}

// Deduce argument types from std::function
template <typename R, typename...Args>
void unpack(std::function<R(Args...)> f, const std::uint8_t* from) {
  unpack_explicit<Args...>(std::move(f), from);
}

// Deduce argument types from function pointer
template <typename R, typename...Args>
void unpack(R (*f)(Args...), const std::uint8_t* from) {
  unpack_explicit<Args...>(f, from);
}

在這里，所有這些都放在一起了。 如果您希望返回類型不是void函數遇到編譯錯誤，請刪除從推論重載中推斷出返回類型的R參數，並簡單地使用void ：

// Deduce argument types from std::function
template <typename...Args>
void unpack(std::function<void(Args...)> f, const std::uint8_t* from) {
  unpack_explicit<Args...>(std::move(f), from);
}

// Deduce argument types from function pointer
template <typename...Args>
void unpack(void (*f)(Args...), const std::uint8_t* from) {
  unpack_explicit<Args...>(f, from);
}

Answer 2

首先，執行實際拆包的功能。 根據需要進行專業化。

template<class T>
T do_unpack(uint8_t * data){
    T val;
    memcpy(&val, data, sizeof(T));
    return val;
}

接下來，使用遞歸模板計算第I個元素的偏移量。 也可以將其編寫為迭代的C ++ 14 constexpr函數，但是GCC 4.9不支持該函數，並且似乎也無法很好地優化非constexpr版本。 而且，僅使用C ++ 11 return遞歸constexpr不會比傳統方法值得麻煩。

// compute the offset of the I-th element
template<size_t I, class T, class... Ts>
struct get_offset_temp {
    static constexpr size_t value = get_offset_temp<I-1, Ts...>::value + sizeof(T);
};

template<class T, class... Ts>
struct get_offset_temp<0, T, Ts...>{
    static constexpr size_t value = 0;
};

現在，使用計算出的偏移量檢索第I個參數的函數：

template<size_t I, class... Ts>
std::tuple_element_t<I, std::tuple<Ts...>> unpack_arg(uint8_t *data){
     using T = std::tuple_element_t<I, std::tuple<Ts...>>;
     return do_unpack<T>(data + get_offset_temp<I, Ts...>::value);
}

最后，將參數解壓縮並調用該function 。 為了避免不必要的f復制，我通過引用傳遞了它：

template<class... Ts, size_t... Is>
void unpack(const std::function<void(Ts...)> &f, uint8_t *dataOffset, std::index_sequence<Is...>){
    f(unpack_arg<Is, Ts...>(dataOffset)...);
}

以及您實際調用的函數，該函數僅構造一個編譯時整數序列並調用上面的函數：

template<class... Ts>
void unpack(std::function<void(Ts...)> f, uint8_t *dataOffset){
    return unpack(f, dataOffset, std::index_sequence_for<Ts...>());
}

演示

一兩次調用之間的二進制大小差異在-O3處約為〜1KiB，在-O0處約為〜8 KiB 。

index_sequence和friends是C ++ 14的功能，但是可以在C ++ 11中實現。 有很多關於SO的實現。 對於C ++ 11，還將tuple_element_t<...>替換為typename tuple_element<...>::type 。

將遞歸可變參數模板函數轉換為迭代

問題描述

2 個解決方案

解決方案1
3 已采納 2015-03-14 09:21:39

編輯：泛化到非PODs。

再次編輯： `std::function<void()>`

解決方案2
2 2015-03-14 06:37:04

將遞歸可變參數模板函數轉換為迭代

問題描述

2 個解決方案

解決方案1 3 已采納 2015-03-14 09:21:39

編輯：泛化到非PODs。

再次編輯： std::function<void()>

解決方案2 2 2015-03-14 06:37:04

解決方案1
3 已采納 2015-03-14 09:21:39

再次編輯： `std::function<void()>`

解決方案2
2 2015-03-14 06:37:04