在 C++ 中复制具有特定更改的向量的最快方法

Question

我想在将元素属性设置为零的同时复制向量。 我有一个std::vector<PLY>向量，其中包含特定数量的以下结构元素：

struct PLY{
    float x;
    float y;
    float z;
}

创建此向量副本的最快方法是什么，其中每个 PLY 元素的z-value 0 ？ 有没有比创建向量的副本然后迭代每个元素来设置新的 z 值更快的方法？

Answer 1

您可以使用std::transform ：

std::vector<PLY> zeroed{};
zeroed.reserve(other_vec.size()); //pre-allocate the storage
std::transform(other_vec.begin(), other_vec.end(), std::back_inserter(zeroed), 
           [](auto e){ e.z = 0.f; return e; });

Answer 2

什么是最快的方式...？

第一个回答

测试一下。 内存架构做了令人惊讶的事情。

#include <iostream>
#include <chrono>
#include <vector>
#include <iomanip>
#include <algorithm>

struct PLY
{
    PLY() : x(0), y(0), z(0) {}
    PLY(float x, float y, float z) : x(x), y(y), z(z) {}
    float x, y , z;
};



template<class F>
std::vector<PLY> test(const char* name, std::vector<PLY> samples, F f)
{
    using namespace std::literals;
    std::vector<PLY> result;
    result.reserve(samples.size());

    auto start = std::chrono::high_resolution_clock::now();

    f(result, samples);

    auto end = std::chrono::high_resolution_clock::now();

    using fns = std::chrono::duration<long double, std::chrono::nanoseconds::period>;
    using fms = std::chrono::duration<long double, std::chrono::milliseconds::period>;
    using fs = std::chrono::duration<long double, std::chrono::seconds::period>;

    auto interval = fns(end - start);
    auto time_per_sample = interval / samples.size();
    auto samples_per_second = 1s / time_per_sample;

    std::cout << "testing " << name << '\n';
    std::cout << " sample size        : " << samples.size() << '\n';
    std::cout << " time taken         : " << std::fixed << fms(interval).count() << "ms\n";
    std::cout << " time per sample    : " << std::fixed << (interval / samples.size()).count() << "ns\n";
    std::cout << " samples per second : " << std::fixed << samples_per_second << "\n";

    return result;
}

struct zero_z_iterator : std::vector<PLY>::const_iterator
{
    using base_class = std::vector<PLY>::const_iterator;
    using value_type = PLY;

    using base_class::base_class;

    value_type operator*() const {
        auto const& src = base_class::operator*();
        return PLY{ src.x, src.y, 0.0 };
    }
};

int main()
{

    test("transform", std::vector<PLY>(1000000), [](auto& target, auto& source)
         {
             std::transform(source.begin(), source.end(),
                            std::back_inserter(target),
                            [](auto& ply) {
                                return PLY { ply.x, ply.y, ply.z };
                            });
         });

    test("copy and reset z", std::vector<PLY>(1000000), [](auto& target, auto& source)
         {
             std::copy(source.begin(), source.end(),
                       std::back_inserter(target));
             for (auto& x : target)
             {
                 x.z = 0;
             }
         });

    test("hand_roll", std::vector<PLY>(1000000), [](auto& target, auto& source)
         {
             for(auto& x : source) {
                 target.emplace_back(x.x, x.y, 0.0);
             }
         });

    test("assign through custom iterator", std::vector<PLY>(1000000), [](auto& target, auto& source)
         {
             target.assign(zero_z_iterator(source.begin()),
                                           zero_z_iterator(source.end()));
         });


    test("transform", std::vector<PLY>(100000000), [](auto& target, auto& source)
         {
             std::transform(source.begin(), source.end(),
                            std::back_inserter(target),
                            [](auto& ply) {
                                return PLY { ply.x, ply.y, ply.z };
                            });
         });

    test("copy and reset z", std::vector<PLY>(100000000), [](auto& target, auto& source)
         {
             std::copy(source.begin(), source.end(),
                       std::back_inserter(target));
             for (auto& x : target)
             {
                 x.z = 0;
             }
         });

    test("hand_roll", std::vector<PLY>(100000000), [](auto& target, auto& source)
         {
             for(auto& x : source) {
                 target.emplace_back(x.x, x.y, 0.0);
             }
         });

    test("assign through custom iterator", std::vector<PLY>(100000000), [](auto& target, auto& source)
         {
             target.assign(zero_z_iterator(source.begin()),
                           zero_z_iterator(source.end()));
         });
}

样本结果

testing transform
 sample size        : 1000000
 time taken         : 7.495685ms
 time per sample    : 7.495685ns
 samples per second : 133410088.604310
testing copy and reset z
 sample size        : 1000000
 time taken         : 3.436614ms
 time per sample    : 3.436614ns
 samples per second : 290984090.735823
testing hand_roll
 sample size        : 1000000
 time taken         : 3.289287ms
 time per sample    : 3.289287ns
 samples per second : 304017253.587176
testing assign through custom iterator
 sample size        : 1000000
 time taken         : 2.563334ms
 time per sample    : 2.563334ns
 samples per second : 390116933.649692
testing transform
 sample size        : 100000000
 time taken         : 768.941767ms
 time per sample    : 7.689418ns
 samples per second : 130048859.733744
testing copy and reset z
 sample size        : 100000000
 time taken         : 880.893920ms
 time per sample    : 8.808939ns
 samples per second : 113521046.892911
testing hand_roll
 sample size        : 100000000
 time taken         : 769.276240ms
 time per sample    : 7.692762ns
 samples per second : 129992315.894223
testing assign through custom iterator
 sample size        : 100000000
 time taken         : 689.493098ms
 time per sample    : 6.894931ns
 samples per second : 145034084.155546

最终答案

通过自定义转换迭代器进行分配。

工具箱的礼物

template<class Container, class Iter, class TransformFunction>
void assign_transform(Container& target, Iter first, Iter last, TransformFunction func)
{
    struct transform_iterator : Iter
    {
        using base_class = Iter;
        using value_type = typename Iter::value_type;

        transform_iterator(Iter base, TransformFunction& f)
        : base_class(base), func(std::addressof(f))
        {}

        value_type operator*() const {
            auto const& src = base_class::operator*();
            return (*func)(src);
        }
        TransformFunction* func;
    };

    target.assign(transform_iterator(first, func),
                  transform_iterator(last, func));
}

像这样使用：

         assign_transform(target, source.begin(), source.end(),
                          [](auto& from)
         {
             return PLY(from.x, from.y, 0.0);
         });

Answer 3

如果有，您的编译器可能会找到它。 尽可能简单明了地编写代码。 如果这在您的平台上有意义，这将为编译器提供最佳机会来一起优化副本和循环。

Answer 4

使用默认分配器的向量有两个问题：

如果向量被调整为更大的大小，它的每个元素都被初始化并且初始化有成本，
当为向量保留的内存和元素插入其中时，由于向量的大小更新，每次插入都会产生成本。

为了摆脱这里讨论的这个问题，可以使用自定义分配器来拒绝进行任何初始化。 创建具有所需大小的向量时， memcpy或 for 循环可用于复制数据：

#include <vector>
#include <cstring>
template <class T>
class no_init_alloc
    : public std::allocator<T>
{
public:
    using std::allocator<T>::allocator;

    template <class U, class... Args> void construct(U*, Args&&...) {}
};
struct PLY
{
    float x, y , z;
};
int main()
{
    std::vector<PLY> source(1000000);
    //create a vector with the custom allocator refusing any initialization
    std::vector<PLY, no_init_alloc<PLY>> target(source.size());
    //then we can use memcpy approach
    {
        memcpy(target.data(), source.data(), source.size() * sizeof(source.front()));
        for(auto& t : target) t.z = 0.0f;
    }
    // or simple for loop approach
    {
         size_t sz = target.size();
         for(size_t i = 0; i < sz; ++i) {
            target[i].x = source[i].x;
            target[i].y = source[i].y;
            target[i].z = 0.0f;
         }
    }

}

使用@Richard Hodges 的基准和 -O2 优化，结果是：

CLNAG：

testing transform
 sample size        : 1000000
 time taken         : 8.363995ms
 time per sample    : 8.363995ns
 samples per second : 119560090.602637
testing assign through custom iterator
 sample size        : 1000000
 time taken         : 7.162974ms
 time per sample    : 7.162974ns
 samples per second : 139606816.945029
testing no_init_alloc_memcpy
 sample size        : 1000000
 time taken         : 6.918533ms
 time per sample    : 6.918533ns
 samples per second : 144539312.018892
testing no_init_alloc_for
 sample size        : 1000000
 time taken         : 6.383721ms
 time per sample    : 6.383721ns
 samples per second : 156648450.018414

海湾合作委员会：

testing transform
 sample size        : 1000000
 time taken         : 12.083038ms
 time per sample    : 12.083038ns
 samples per second : 82760643.473934
testing assign through custom iterator
 sample size        : 1000000
 time taken         : 6.188324ms
 time per sample    : 6.188324ns
 samples per second : 161594641.780230
testing no_init_alloc_memcpy
 sample size        : 1000000
 time taken         : 3.000699ms
 time per sample    : 3.000699ns
 samples per second : 333255684.758785
testing no_init_alloc_for
 sample size        : 1000000
 time taken         : 1.979482ms
 time per sample    : 1.979482ns
 samples per second : 505182669.001284

最终答案：

使用带有简单 for 循环的自定义非初始化分配器。

Answer 5

听起来像是std::transform的工作，带有一个小 lambda 来对每个元素进行转换。

在 C++ 中复制具有特定更改的向量的最快方法

问题描述

5 个解决方案

解决方案1
8 2016-08-22 09:31:16

解决方案2
7 已采纳 2016-08-22 10:10:57

第一个回答

样本结果

最终答案

工具箱的礼物

解决方案3
3 2016-08-22 09:31:49

解决方案4
3 2016-08-22 17:57:49

解决方案5
2 2016-08-22 09:31:32

在 C++ 中复制具有特定更改的向量的最快方法

问题描述

5 个解决方案

解决方案1 8 2016-08-22 09:31:16

解决方案2 7 已采纳 2016-08-22 10:10:57

第一个回答

样本结果

最终答案

工具箱的礼物

解决方案3 3 2016-08-22 09:31:49

解决方案4 3 2016-08-22 17:57:49

解决方案5 2 2016-08-22 09:31:32

解决方案1
8 2016-08-22 09:31:16

解决方案2
7 已采纳 2016-08-22 10:10:57

解决方案3
3 2016-08-22 09:31:49

解决方案4
3 2016-08-22 17:57:49

解决方案5
2 2016-08-22 09:31:32