Fastest way to negate a std::vector

Question

Assume I have a std::vector of double, namely

std::vector<double> MyVec(N);

Where N is so big that performance matters. Now assume that MyVec is a nontrivial vector (ie it is not a vector of zeros, but has been modified by some routine). Now, I need the negated version of the vector: I need -MyVec .

So far, I have been implementing it via

std::transform(MyVec.cbegin(),MyVec.cend(),MyVec.begin(),std::negate<double>());

But, really, I do not know if this is something sensible or it is just super naïve from my side.

Am I doing it correctly? Or std::transform is just a super slow routine in this case?

PS: I am using BLAS and LAPACK libraries all the time, but I have not found anything that matches this particular need. However, if there exists such a function in BLAS/LAPACK which is faster than std::transform, I would be glad to know.

Answer 1

#include <vector>
#include <algorithm>
#include <functional> 
void check()
{
    std::vector<double> MyVec(255);
    std::transform(MyVec.cbegin(),MyVec.cend(),MyVec.begin(),std::negate<double>());
}

This code on https://godbolt.org/ with copile option -O3 generate nice assembly

.L3:
[...]
  cmp r8, 254
  je .L4
  movsd xmm0, QWORD PTR [rdi+2032]
  xorpd xmm0, XMMWORD PTR .LC0[rip]
  movsd QWORD PTR [rdi+2032], xmm0
.L4:

It's difficult to imagine faster. Your code is already perfect, don't try to outsmart the compiler and use clean C++ code it works almost every times.

Answer 2

Fortunately the data in std::vector is contiguous so you can multiply by -1 using vector intrinsics (using unaligned load/stores and special handing of the possible overflow). Or use ippsMulC_64f / ippsMulC_64f_I from intel's IPP library (you'll struggle to write something faster) which will use the largest vector registers available to your platform: https://software.intel.com/en-us/ipp-dev-reference-mulc

Update: to clear up some confusion in the comments, the full version of Intel IPP is free (although you can pay for support) and comes on Linux, Windows and macOS.

Answer 3

As others have mentioned, it completely depends on your use case. Probably the simplest way would be something like this:

 struct MyNegatingVect {
     MyVect data;
     bool negated = false;
     void negate() { negated = !negated; }
     // ... setter and getter need indirection ...
     // ..for example
     MyVect::data_type at(size_t index) { return negated ? - data.at(index) : data.at(index);
 };

Whether this additional indirection for each single access is worth transforming the negation into setting a single bool depends, as already mentioned, on your use case (actually I doubt that there is a use case where this would bring any measurable benefit).

Answer 4

First, a generic negate function for arithmetic type vectors as an example:

#include <type_traits>
#include <vector>

...

template <typename arithmetic_type> std::vector<arithmetic_type> &
negate (std::vector<arithmetic_type> & v)
{
    static_assert(std::is_arithmetic<arithmetic_type>::value,
        "negate: not an arithmetic type vector");

    for (auto & vi : v) vi = - vi;

    // note: anticipate that a range-based for may be more amenable
    // to loop-unrolling, vectorization, etc., due to fewer compiler
    // template transforms, and contiguous memory / stride.

    // in theory, std::transform may generate the same code, despite
    // being less concise. very large vectors *may* possibly benefit
    // from C++17's 'std::execution::par_unseq' policy?

    return v;
}

Your wish for a canonical unary operator - function is going to require a the creation of a temporary, in the form:

std::vector<double> operator - (const std::vector<double> & v)
{
    auto ret (v); return negate(ret);
}

Or generically:

template <typename arithmetic_type> std::vector<arithmetic_type>
operator - (const std::vector<arithmetic_type> & v)
{
    auto ret (v); return negate(ret);
}

Do not be tempted to implement the operator as:

template <typename arithmetic_type> std::vector<arithmetic_type> &
operator - (std::vector<arithmetic_type> & v)
{
    return negate(v);
}

While (- v) will negate the elements and return the modified vector without the need for a temporary, it breaks mathematical conventions by effectively setting: v = - v; If that's your goal, then use the negate function. Don't break expected operator evaluation!

clang, with avx512 enabled, generates this loop, negating an impressive 64 doubles per iteration - between pre/post length handling:

        vpbroadcastq    LCPI0_0(%rip), %zmm0
        .p2align        4, 0x90
LBB0_21:
        vpxorq  -448(%rsi), %zmm0, %zmm1
        vpxorq  -384(%rsi), %zmm0, %zmm2
        vpxorq  -320(%rsi), %zmm0, %zmm3
        vpxorq  -256(%rsi), %zmm0, %zmm4
        vmovdqu64       %zmm1, -448(%rsi)
        vmovdqu64       %zmm2, -384(%rsi)
        vmovdqu64       %zmm3, -320(%rsi)
        vmovdqu64       %zmm4, -256(%rsi)
        vpxorq  -192(%rsi), %zmm0, %zmm1
        vpxorq  -128(%rsi), %zmm0, %zmm2
        vpxorq  -64(%rsi), %zmm0, %zmm3
        vpxorq  (%rsi), %zmm0, %zmm4
        vmovdqu64       %zmm1, -192(%rsi)
        vmovdqu64       %zmm2, -128(%rsi)
        vmovdqu64       %zmm3, -64(%rsi)
        vmovdqu64       %zmm4, (%rsi)
        addq    $512, %rsi              ## imm = 0x200
        addq    $-64, %rdx
        jne     LBB0_21

gcc-7.2.0 generates a similar loop, but appears to insist on indexed addressing.

Answer 5

Use for_each

std::for_each(MyVec.begin(), MyVec.end(), [](double& val) { val = -val });

or C++17 parallel

std::for_each(std::execution::par_unseq, MyVec.begin(), MyVec.end(), [](double& val) { val = -val });

Fastest way to negate a std::vector

Question

5 answers

solution1
28 ACCPTED 2017-11-15 17:03:34

solution2
17 2017-11-15 14:51:56

solution3
3 2017-11-15 15:04:26

solution4
3 2017-11-15 19:07:14

solution5
0 2017-11-16 06:35:52

Fastest way to negate a std::vector

Question

5 answers

solution1 28 ACCPTED 2017-11-15 17:03:34

solution2 17 2017-11-15 14:51:56

solution3 3 2017-11-15 15:04:26

solution4 3 2017-11-15 19:07:14

solution5 0 2017-11-16 06:35:52

solution1
28 ACCPTED 2017-11-15 17:03:34

solution2
17 2017-11-15 14:51:56

solution3
3 2017-11-15 15:04:26

solution4
3 2017-11-15 19:07:14

solution5
0 2017-11-16 06:35:52