Convert a recursive variadic template function into iterative

Question

Say I have the following stuct

#include <functional>

template <typename ...T>
struct Unpack;

// specialization case for float
template <typename ...Tail>
struct Unpack<float, Tail...>
{
    static void unpack(std::function<void(float, Tail...)> f, uint8_t *dataOffset)
    {
        float val;
        memcpy(&val, dataOffset, sizeof(float));

        auto g = [&](Tail&& ...args)
        {
            f(val, std::forward<Tail>(args)...);
        };

        Unpack<Tail...>::unpack(std::function<void(Tail...)>{g}, dataOffset + sizeof(float));
    }
};

// base recursive case
template <typename Head, typename ... Tail>
struct Unpack<Head, Tail...>
{
    static void unpack(std::function<void(Head, Tail...)> f, uint8_t *dataOffset)
    {
        Head val;
        memcpy(&val, dataOffset, sizeof(Head));

        auto g = [&](Tail&& ...args)
        {
            f(val, std::forward<Tail>(args)...);
        };

        Unpack<Tail...>::unpack(std::function<void(Tail...)>{g}, dataOffset + sizeof(Head));
    }
};

// end of recursion
template <>
struct Unpack<>
{
    static void unpack(std::function<void()> f, uint8_t *)
    {
        f(); // call the function
    }
};

All it does is takes a std::function and a byte-array, and chunks off the byte-array, applying those chunks as function's arguments, recursively, until all arguments are applied, and then call the function.

The issue I'm having with this is that it generates quite a lot of templates. This is especially noticeable when used extensively in debug mode -- it causes the binary to grow very fast.

Given the following use case

#include <iostream>
#include <string.h>

using namespace std;


void foo1(uint8_t a, int8_t b, uint16_t c, int16_t d, uint32_t e, int32_t f, uint64_t g, int64_t h, float i, double j)
{
    cout << a << "; " << b << "; " << c << "; " << d << "; " << e << "; " << f << "; " << g << "; " << h << "; " << i << "; " << j << endl;
}

void foo2(uint8_t a, int8_t b, uint16_t c, int16_t d, uint32_t e, int32_t f, int64_t g, uint64_t h, float i, double j)
{
    cout << a << "; " << b << "; " << c << "; " << d << "; " << e << "; " << f << "; " << g << "; " << h << "; " << i << "; " << j << endl;
}

int main()
{
    uint8_t *buff = new uint8_t[512];
    uint8_t *offset = buff;

    uint8_t a = 1;
    int8_t b = 2;
    uint16_t c = 3;
    int16_t d = 4;
    uint32_t e = 5;
    int32_t f = 6;
    uint64_t g = 7;
    int64_t h = 8;
    float i = 9.123456789;
    double j = 10.123456789;

    memcpy(offset, &a, sizeof(a));
    offset += sizeof(a);
    memcpy(offset, &b, sizeof(b));
    offset += sizeof(b);
    memcpy(offset, &c, sizeof(c));
    offset += sizeof(c);
    memcpy(offset, &d, sizeof(d));
    offset += sizeof(d);
    memcpy(offset, &e, sizeof(e));
    offset += sizeof(e);
    memcpy(offset, &f, sizeof(f));
    offset += sizeof(f);
    memcpy(offset, &g, sizeof(g));
    offset += sizeof(g);
    memcpy(offset, &h, sizeof(h));
    offset += sizeof(h);
    memcpy(offset, &i, sizeof(i));
    offset += sizeof(i);
    memcpy(offset, &j, sizeof(j));

    std::function<void (uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t, float, double)> ffoo1 = foo1;
    Unpack<uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t, float, double>::unpack(ffoo1, buff);

    // uint64_t and in64_t are switched
    //std::function<void (uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, int64_t, uint64_t, float, double)> ffoo2 = foo2;
    //Unpack<uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, int64_t, uint64_t, float, double>::unpack(ffoo2, buff);

    return 0;
}

the debug binary I get with the two lines being commented is 264.4 KiB, but when I uncomment the two lines it becomes 447.7 KiB, 70% bigger than the original.

Same with release mode: 37.5 KiB vs. 59.0 KiB, 60% bigger than the original.

It makes sense to replace the recursion with iteration, something like an initializer list applied to the variadic Unpack<...>:unpack() , so that C++ would generate only a template per a type.

The code above compiles just fine, if you want to play with it a little.

Answer 1

I wrote some crazy thing with templates and index sequences and tuples fully constrained with concepts from ranges-v3 , and it was fine. Then it occurred to me it would be easier for the compiler to optimize if you unpack the arguments directly into a function call. First we make a class that can deserialize any POD type (could probably be relaxed to trivially copyable) from a char* :

struct deserializer {
  const std::uint8_t* in_;

  deserializer(const std::uint8_t* in) : in_{in} {}

  template <typename T>
  operator T() {
    static_assert(std::is_pod<T>(), "");
    T t;
    std::memcpy(&t, in_, sizeof(T));
    in_ += sizeof(T);
    return t;
  }
};

and then you can generically implement unpack as:

template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  deserializer d{from};
  std::forward<F>(f)(static_cast<Ts>(d)...); // Oops, broken.
}

Except that it has unspecified behavior, due to the fact that the order of function arguments is unspecified. Let's introduce a type to forward the parameters to the function, so that we can use brace initialization to force left-to-right evaluation:

struct forwarder {
  template <typename F, typename...Ts>
  forwarder(F&& f, Ts&&...ts) {
    std::forward<F>(f)(std::forward<Ts>(ts)...);
  }
};

// Requires explicit specification of argument types.
template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  deserializer d{from};
  forwarder{std::forward<F>(f), static_cast<Ts>(d)...};
}

and throw in a couple of specializations to deduce argument types from function pointers and std::function so we don't always have to specify them:

// Deduce argument types from std::function
template <typename R, typename...Args>
void unpack(std::function<R(Args...)> f, const std::uint8_t* from) {
  unpack<Args...>(std::move(f), from);
}

// Deduce argument types from function pointer
template <typename R, typename...Args>
void unpack(R (*f)(Args...), const std::uint8_t* from) {
  unpack<Args...>(f, from);
}

It's all nicely exposed to the compiler and very optimizable. Change in binary size between single and double-call versions is minimal ( stealing TC's framework ):

Using function pointers: ~2K at -O0, 64B at -O3.

Using std::function : ~3K at -O0, 216B at -O3.

The code to unpack and call is a couple of dozen assembly instructions. Eg, with gcc 4.9.2 on x64 optimizing for size with -Os , the explicit specialization

template void unpack(decltype(foo1), const std::uint8_t*);

assembles to :

pushq   %rax
movq    %rsi, %rax
movswl  4(%rsi), %ecx
movzwl  2(%rsi), %edx
movq    %rdi, %r10
movsbl  1(%rsi), %esi
movzbl  (%rax), %edi
pushq   22(%rax)
pushq   14(%rax)
movl    10(%rax), %r9d
movl    6(%rax), %r8d
movsd   34(%rax), %xmm1
movss   30(%rax), %xmm0
call    *(%r10)
addq    $24, %rsp
ret

The code size is small enough to be inlined effectively, so the number of templates generated isn't a factor.

EDIT: Generalizing to non-PODs.

Wrapping up the input iterator in deserializer and using a conversion operator to perform the actual unpacking is "clever" - using both the positive and the negative connotations of "clever" - but it's not extensible. Client code can't add operator blahblah member function overloads, and the only way to control overloading for conversion operators is with heaps of SFINAE. Yuck. So let's give up the deserializer idea, and use an extensible dispatch mechanism.

First, a metafunction to strip references and cv-qualifiers so that we can eg unpack std::vector<double> when a parameter signature is const std::vector<double>& :

template <typename T>
using uncvref =
  typename std::remove_cv<
    typename std::remove_reference<T>::type
  >::type;

I'm a fan of tag dispatching, so devise a tag wrapper that can hold any type:

template <typename T> struct arg_tag {};

and then we can have a generic argument unpack function that performs the tag dispatch:

template <typename T>
uncvref<T> unpack_arg(const std::uint8_t*& from) {
  return unpack_arg(arg_tag<uncvref<T>>{}, from);

Thanks to the magic of Argument Dependent Lookup, overloads of unpack_arg declared after the definition of the dispatcher will be found as long as they are declared before use. Ie, the dispatch system is easily extensible. We'll provide the POD unpacker:

template <typename T, typename std::enable_if<std::is_trivial<T>::value, int>::type = 0>
T unpack_arg(arg_tag<T>, const std::uint8_t*& from) {
  T t;
  std::memcpy(&t, from, sizeof(T));
  from += sizeof(T);
  return t;
}

which technically matches any arg_tag , but is removed from overload resolution by SFINAE if the matched type is non-trivial. (Yes, I know I said POD before. I changed my mind; trivial types are a little more general and still memcpy -able.) The front end to this dispatch mechanism doesn't change much:

struct forwarder {
  template <typename F, typename...Args>
  forwarder(F&& f, Args&&...args) {
    std::forward<F>(f)(std::forward<Args>(args)...);
  }
};

// Requires explicit specification of argument types.
template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  forwarder{std::forward<F>(f), unpack_arg<Ts>(from)...};
}

forwarder is unchanged, the unpack<Types...>() API uses unpack_arg<Ts>(from)... in place of static_cast<Ts>(d)... but obviously still has the same structure. The type-deducing overloads:

template <typename R, typename...Args>
void unpack(std::function<R(Args...)> f, const std::uint8_t* from) {
  unpack<Args...>(std::move(f), from);
}

template <typename R, typename...Args>
void unpack(R (*f)(Args...), const std::uint8_t* from) {
  unpack<Args...>(f, from);
}

Work correctly unchanged. Now we can provide an extension to unpack vectors by overloading unpack_arg for arg_tag<std::vector<T>> :

using vec_size_t = int;

template <typename T>
std::vector<T> unpack_arg(arg_tag<std::vector<T>>, const std::uint8_t*& from) {
  std::vector<T> vec;
  auto n = unpack_arg<vec_size_t>(from);
  vec.reserve(n);
  std::generate_n(std::back_inserter(vec), n, [&from]{
    return unpack_arg<T>(from);
  });
  return vec;
}

Note how the vector unpack overload goes through the dispatcher to unpack its components: unpack_arg<vec_size_t>(from) for the size, and unpack_arg<T>(from) for each of the elements.

Edit again: `std::function<void()>`

Now the code has a problem: if f is std::function<void()> or void(*)(void) then the unpack overloads that deduce argument types from f will call themselves and recurse infinitely. The easiest fix is to name the function that does the actual work of unpacking something different - I'll pick unpack_explicit - and have the various unpack frontends call it:

template <typename...Ts, typename F>
void unpack_explicit(F&& f, const std::uint8_t* from) {
  forwarder{std::forward<F>(f), unpack_arg<Ts>(from)...};
}

// Requires explicit specification of argument types.
template <typename...Ts, typename F>
void unpack(F&& f, const std::uint8_t* from) {
  unpack_explicit<Ts...>(std::forward<F>(f), from);
}

// Deduce argument types from std::function
template <typename R, typename...Args>
void unpack(std::function<R(Args...)> f, const std::uint8_t* from) {
  unpack_explicit<Args...>(std::move(f), from);
}

// Deduce argument types from function pointer
template <typename R, typename...Args>
void unpack(R (*f)(Args...), const std::uint8_t* from) {
  unpack_explicit<Args...>(f, from);
}

Here it is all put together. If you prefer to get a compile error for functions with return types other than void , drop the R parameter that deduces the return type from the deducing overloads and simply use void :

// Deduce argument types from std::function
template <typename...Args>
void unpack(std::function<void(Args...)> f, const std::uint8_t* from) {
  unpack_explicit<Args...>(std::move(f), from);
}

// Deduce argument types from function pointer
template <typename...Args>
void unpack(void (*f)(Args...), const std::uint8_t* from) {
  unpack_explicit<Args...>(f, from);
}

Answer 2

First, a function to perform the actual unpacking. Specialize as needed.

template<class T>
T do_unpack(uint8_t * data){
    T val;
    memcpy(&val, data, sizeof(T));
    return val;
}

Next, a recursive template to compute the offset of the I -th element. This can be written as an iterative C++14 constexpr function as well, but GCC 4.9 doesn't support that, and doesn't seem to optimize the non- constexpr version well. And a C++11 return -only recursive constexpr doesn't feel like it's worth the trouble over the traditional approach.

// compute the offset of the I-th element
template<size_t I, class T, class... Ts>
struct get_offset_temp {
    static constexpr size_t value = get_offset_temp<I-1, Ts...>::value + sizeof(T);
};

template<class T, class... Ts>
struct get_offset_temp<0, T, Ts...>{
    static constexpr size_t value = 0;
};

Now, a function to retrieve the I -th argument, using the computed offset:

template<size_t I, class... Ts>
std::tuple_element_t<I, std::tuple<Ts...>> unpack_arg(uint8_t *data){
     using T = std::tuple_element_t<I, std::tuple<Ts...>>;
     return do_unpack<T>(data + get_offset_temp<I, Ts...>::value);
}

And finally, the function that unpacks the argument and calls the function . To avoid a needless copy of f I passed it by reference:

template<class... Ts, size_t... Is>
void unpack(const std::function<void(Ts...)> &f, uint8_t *dataOffset, std::index_sequence<Is...>){
    f(unpack_arg<Is, Ts...>(dataOffset)...);
}

And the actual function you call, which merely constructs a compile-time integer sequence and calls the function above:

template<class... Ts>
void unpack(std::function<void(Ts...)> f, uint8_t *dataOffset){
    return unpack(f, dataOffset, std::index_sequence_for<Ts...>());
}

Demo .

The difference in binary size between one and two calls is ~1KiB at -O3 , and ~8 KiB at -O0 .

index_sequence and friends are C++14 features, but implementable in C++11. There are plenty of implementations on SO. For C++11, also replace tuple_element_t<...> with typename tuple_element<...>::type .

Convert a recursive variadic template function into iterative

Question

2 answers

solution1
3 ACCPTED 2015-03-14 09:21:39

EDIT: Generalizing to non-PODs.

Edit again: `std::function<void()>`

solution2
2 2015-03-14 06:37:04

Convert a recursive variadic template function into iterative

Question

2 answers

solution1 3 ACCPTED 2015-03-14 09:21:39

EDIT: Generalizing to non-PODs.

Edit again: std::function<void()>

solution2 2 2015-03-14 06:37:04

solution1
3 ACCPTED 2015-03-14 09:21:39

Edit again: `std::function<void()>`

solution2
2 2015-03-14 06:37:04