简体   繁体   中英

std::transform slower than for loops

I thought about implementing a matrix class that used std::transform from algorithm for calculation but I came across that in some situations it's faster to write loops.

Having a look add operator+= for element wise add. In case the rhs matrix has 1 col while having the same number of rows than the lhs matrix I can do the following:

for (auto c = 0; c < cols(); ++c) {
    std::transform(std::execution::par, col_begin(c), col_end(c), rhs.begin(), col_begin(c), std::plus<>());
}

or use simple loops:

auto lhsval = begin();
auto rhsval= rhs.begin();

for (auto r = 0; r < rows(); ++r) {
   for (auto c = 0; c < cols(); ++c) {
       *lhsval += *rhsval;
       ++lhsval;
   }
   ++rhsval;
}

For your information, i wrote an iterator that accepts a step. So the col_begin() returns an iterator that will skip other columns in the operator++

I timed the difference between both implementations using google benchmark and came to the conclusion that the loop is about 5 times faster than using std::transform. Well maybe there should be a difference, but not a difference that huge.

You can look at the complete code at my github repo

matrix class matrix iterator

Passing std::execution::par is asking the library to parallelize this operation. This adds overhead, even if it is just to determine "your problem is too small to parallelize". The number of elements being transformed has to be quite large (sometimes hundreds of thousands or millions) before the parallelization is worthwhile, and requires that you have appropriate hardware (parallelizing on a two-core machine is much less likely to be worth it than on a 64-core machine).

The for loop version is much more similar to plain std::transform without the std::execution::par parameter. If you remove that parameter and the performance difference is still large, please update your question with that information, alongside your compiler version, platform, compiler switches and information about your data set: number of rows/columns, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM