简体   繁体   中英

Fastest way to negate a std::vector

Assume I have a std::vector of double, namely

std::vector<double> MyVec(N);

Where N is so big that performance matters. Now assume that MyVec is a nontrivial vector (ie it is not a vector of zeros, but has been modified by some routine). Now, I need the negated version of the vector: I need -MyVec .

So far, I have been implementing it via

std::transform(MyVec.cbegin(),MyVec.cend(),MyVec.begin(),std::negate<double>());

But, really, I do not know if this is something sensible or it is just super naïve from my side.

Am I doing it correctly? Or std::transform is just a super slow routine in this case?

PS: I am using BLAS and LAPACK libraries all the time, but I have not found anything that matches this particular need. However, if there exists such a function in BLAS/LAPACK which is faster than std::transform, I would be glad to know.

#include <vector>
#include <algorithm>
#include <functional> 
void check()
{
    std::vector<double> MyVec(255);
    std::transform(MyVec.cbegin(),MyVec.cend(),MyVec.begin(),std::negate<double>());
}

This code on https://godbolt.org/ with copile option -O3 generate nice assembly

.L3:
[...]
  cmp r8, 254
  je .L4
  movsd xmm0, QWORD PTR [rdi+2032]
  xorpd xmm0, XMMWORD PTR .LC0[rip]
  movsd QWORD PTR [rdi+2032], xmm0
.L4:

It's difficult to imagine faster. Your code is already perfect, don't try to outsmart the compiler and use clean C++ code it works almost every times.

Fortunately the data in std::vector is contiguous so you can multiply by -1 using vector intrinsics (using unaligned load/stores and special handing of the possible overflow). Or use ippsMulC_64f / ippsMulC_64f_I from intel's IPP library (you'll struggle to write something faster) which will use the largest vector registers available to your platform: https://software.intel.com/en-us/ipp-dev-reference-mulc

Update: to clear up some confusion in the comments, the full version of Intel IPP is free (although you can pay for support) and comes on Linux, Windows and macOS.

As others have mentioned, it completely depends on your use case. Probably the simplest way would be something like this:

 struct MyNegatingVect {
     MyVect data;
     bool negated = false;
     void negate() { negated = !negated; }
     // ... setter and getter need indirection ...
     // ..for example
     MyVect::data_type at(size_t index) { return negated ? - data.at(index) : data.at(index);
 };

Whether this additional indirection for each single access is worth transforming the negation into setting a single bool depends, as already mentioned, on your use case (actually I doubt that there is a use case where this would bring any measurable benefit).

First, a generic negate function for arithmetic type vectors as an example:

#include <type_traits>
#include <vector>

...

template <typename arithmetic_type> std::vector<arithmetic_type> &
negate (std::vector<arithmetic_type> & v)
{
    static_assert(std::is_arithmetic<arithmetic_type>::value,
        "negate: not an arithmetic type vector");

    for (auto & vi : v) vi = - vi;

    // note: anticipate that a range-based for may be more amenable
    // to loop-unrolling, vectorization, etc., due to fewer compiler
    // template transforms, and contiguous memory / stride.

    // in theory, std::transform may generate the same code, despite
    // being less concise. very large vectors *may* possibly benefit
    // from C++17's 'std::execution::par_unseq' policy?

    return v;
}

Your wish for a canonical unary operator - function is going to require a the creation of a temporary, in the form:

std::vector<double> operator - (const std::vector<double> & v)
{
    auto ret (v); return negate(ret);
}

Or generically:

template <typename arithmetic_type> std::vector<arithmetic_type>
operator - (const std::vector<arithmetic_type> & v)
{
    auto ret (v); return negate(ret);
}

Do not be tempted to implement the operator as:

template <typename arithmetic_type> std::vector<arithmetic_type> &
operator - (std::vector<arithmetic_type> & v)
{
    return negate(v);
}

While (- v) will negate the elements and return the modified vector without the need for a temporary, it breaks mathematical conventions by effectively setting: v = - v; If that's your goal, then use the negate function. Don't break expected operator evaluation!


clang, with avx512 enabled, generates this loop, negating an impressive 64 doubles per iteration - between pre/post length handling:

        vpbroadcastq    LCPI0_0(%rip), %zmm0
        .p2align        4, 0x90
LBB0_21:
        vpxorq  -448(%rsi), %zmm0, %zmm1
        vpxorq  -384(%rsi), %zmm0, %zmm2
        vpxorq  -320(%rsi), %zmm0, %zmm3
        vpxorq  -256(%rsi), %zmm0, %zmm4
        vmovdqu64       %zmm1, -448(%rsi)
        vmovdqu64       %zmm2, -384(%rsi)
        vmovdqu64       %zmm3, -320(%rsi)
        vmovdqu64       %zmm4, -256(%rsi)
        vpxorq  -192(%rsi), %zmm0, %zmm1
        vpxorq  -128(%rsi), %zmm0, %zmm2
        vpxorq  -64(%rsi), %zmm0, %zmm3
        vpxorq  (%rsi), %zmm0, %zmm4
        vmovdqu64       %zmm1, -192(%rsi)
        vmovdqu64       %zmm2, -128(%rsi)
        vmovdqu64       %zmm3, -64(%rsi)
        vmovdqu64       %zmm4, (%rsi)
        addq    $512, %rsi              ## imm = 0x200
        addq    $-64, %rdx
        jne     LBB0_21

gcc-7.2.0 generates a similar loop, but appears to insist on indexed addressing.

Use for_each

std::for_each(MyVec.begin(), MyVec.end(), [](double& val) { val = -val });

or C++17 parallel

std::for_each(std::execution::par_unseq, MyVec.begin(), MyVec.end(), [](double& val) { val = -val });

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM