简体   繁体   English

std::transform 比 for 循环慢

[英]std::transform slower than for loops

I thought about implementing a matrix class that used std::transform from algorithm for calculation but I came across that in some situations it's faster to write loops.我考虑过实现一个矩阵 class ,它使用算法中的std::transform进行计算,但我发现在某些情况下编写循环更快。

Having a look add operator+= for element wise add.看看 add operator+=用于元素明智的添加。 In case the rhs matrix has 1 col while having the same number of rows than the lhs matrix I can do the following:如果 rhs 矩阵有 1 col 而行数与 lhs 矩阵相同,我可以执行以下操作:

for (auto c = 0; c < cols(); ++c) {
    std::transform(std::execution::par, col_begin(c), col_end(c), rhs.begin(), col_begin(c), std::plus<>());
}

or use simple loops:或使用简单的循环:

auto lhsval = begin();
auto rhsval= rhs.begin();

for (auto r = 0; r < rows(); ++r) {
   for (auto c = 0; c < cols(); ++c) {
       *lhsval += *rhsval;
       ++lhsval;
   }
   ++rhsval;
}

For your information, i wrote an iterator that accepts a step.为了您的信息,我写了一个接受一个步骤的迭代器。 So the col_begin() returns an iterator that will skip other columns in the operator++所以col_begin()返回一个迭代器,它将跳过operator++中的其他列

I timed the difference between both implementations using google benchmark and came to the conclusion that the loop is about 5 times faster than using std::transform.我使用谷歌基准测试了两种实现之间的差异,并得出结论,循环比使用 std::transform 快大约 5 倍。 Well maybe there should be a difference, but not a difference that huge.好吧,也许应该有区别,但没有那么大的区别。

You can look at the complete code at my github repo您可以在我的 github repo中查看完整代码

matrix class matrix iterator 矩阵 class 矩阵迭代器

Passing std::execution::par is asking the library to parallelize this operation.传递std::execution::par是要求库并行化此操作。 This adds overhead, even if it is just to determine "your problem is too small to parallelize".这会增加开销,即使只是为了确定“您的问题太小而无法并行化”。 The number of elements being transformed has to be quite large (sometimes hundreds of thousands or millions) before the parallelization is worthwhile, and requires that you have appropriate hardware (parallelizing on a two-core machine is much less likely to be worth it than on a 64-core machine).在并行化值得之前,被转换的元素数量必须非常大(有时数十万或数百万),并且需要您拥有适当的硬件(在两核机器上并行化比在64 核机器)。

The for loop version is much more similar to plain std::transform without the std::execution::par parameter. for循环版本更类似于没有std::execution::par参数的普通std::transform If you remove that parameter and the performance difference is still large, please update your question with that information, alongside your compiler version, platform, compiler switches and information about your data set: number of rows/columns, etc.如果您删除该参数并且性能差异仍然很大,请使用该信息更新您的问题,以及您的编译器版本、平台、编译器开关和有关您的数据集的信息:行数/列数等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM