为什么std :: vector <uint8_t> :: insert使用MSVC 2015编译器的速度比std :: copy快5倍？

Question

I have a trivial function that copies a byte block to std::vector: 我有一个微不足道的功能，它将一个字节块复制到std :: vector：

std::vector<uint8_t> v;

void Write(const uint8_t * buffer, size_t count)
{
    //std::copy(buffer, buffer + count, std::back_inserter(v));

    v.insert(v.end(), buffer, buffer + count);
}

v.reserve(<buffer size>);
v.resize(0);

Write(<some buffer>, <buffer size>);

if I use std::vector<uint8_t>::insert it works 5 times faster than if I use std::copy . 如果我使用std::vector<uint8_t>::insert它的速度比我使用std::copy速度快5倍。

I tried to compile this code with MSVC 2015 with enabled and disabled optimization and got the same result. 我尝试使用启用和禁用优化的MSVC 2015编译此代码，并得到相同的结果。

Looks like something is strange with std::copy or std::back_inserter implementation. 似乎std::copy或std::back_inserter实现有些奇怪。

Answer 1

Standard library implementation is written with performance in mind, but performance is achieved only when optimization is ON. 标准库实现的编写考虑了性能，但是只有在优化打开时才能实现性能。

 //This reduces the performance dramatically if the optimization is switched off.

Trying to measure a function performance with optimization OFF is as pointless as asking ourselves if the law of gravitation would still be true if there were no mass left in the Universe . 试图在优化关闭的情况下测量功能性能就像问自己，如果宇宙中没有质量，引力定律是否仍然成立，那毫无意义。

Answer 2

The call to v.insert is calling a member function of the container. 对v.insert的调用正在调用容器的成员函数。 The member function knows how the container is implemented, so it can do things that a more generic algorithm can't do. 成员函数知道容器的实现方式，因此它可以执行更通用的算法无法执行的操作。 In particular, when inserting a range of values designated by random-access iterators into a vector, the implementation knows how many elements are being added, so it can resize the internal storage once and then just copy the elements. 特别是，在将随机访问迭代器指定的值范围插入向量中时，该实现知道要添加多少个元素，因此它可以一次调整内部存储的大小，然后只需复制这些元素即可。

The call to std::copy with an insert-iterator, on the other hand, has to call insert for each element. 另一方面，使用insert-iterator调用std::copy时，必须为每个元素调用insert 。 It can't preallocate, because std::copy works with sequences, not containers; 它不能预分配，因为std::copy使用序列而不是容器； it doesn't know how to adjust the size of the container. 它不知道如何调整容器的大小。 So for large insertions into a vector the internal storage gets resized each time the vector is full and a new insertion is needed. 因此，对于向向量的大插入，每次向量满时需要重新调整内部存储器的大小，并且需要重新插入。 The overhead of that reallocation is amortized constant time, but the constant is much larger than the constant when only one resizing is done. 该重新分配的开销是摊销的固定时间，但是该常数比仅进行一次调整大小时的常数大得多。

With the call to reserve (which I overlooked, thanks, @ChrisDrew), the overhead of reallocating is not as significant. 通过调用reserve （我忽略了，@ ChrisDrew，我忽略了它），重新分配的开销并不那么重要。 But the implementation of insert knows how many values are being copied, and it knows that those values are contiguous in memory (because the iterator is a pointer), and it knows that the values are trivially copyable, so it will use std::memcpy to blast the bits in all at once. 但是insert的实现知道要复制多少个值，并且知道这些值在内存中是连续的（因为迭代器是指针），并且知道这些值是可微复制的，因此将使用std::memcpy一次炸掉所有的碎片。 With std::copy , none of that applies; 对于std::copy ，都不适用； the back inserter has to check whether a reallocation is necessary, and that code can't be optimized out, so you end up with a loop that copies an element at a time, checking for the end of the allocated space for each element. 后面的插入程序必须检查是否需要重新分配，并且无法优化代码，因此您最终会得到一个循环，该循环一次复制一个元素，并检查为每个元素分配的空间的结尾。 That's much more expensive than a plain std::memcpy . 这比普通的std::memcpy要贵得多。

In general, the more the algorithm knows about the internals of the data structure that it's accessing, the faster it can be. 通常，算法对所访问的数据结构的内部了解越多，则速度越快。 STL algorithms are generic, and the cost of that genericity can be more overhead than a that of a container-specific algorithm. STL算法是通用的，与容器特定算法相比，这种通用的开销可能会更大。

Answer 3

With a good implementation of std::vector , v.insert(v.end(), buffer, buffer + count); 通过std::vector的良好实现， v.insert(v.end(), buffer, buffer + count); might be implemented as: 可能实现为：

size_t count = last-first;
resize(size() + count);
memcpy(data+offset, first, count);

std::copy(buffer, buffer + count, std::back_inserter(v)) on the other hand will be implemented as: 另一方面std::copy(buffer, buffer + count, std::back_inserter(v))将实现为：

while ( first != last )
{
   *output++ = *first++;
}

which is equivalent to: 等效于：

while ( first != last )
{
   v.push_back( *first++ );
}

or (roughly): 或（大致）：

while ( first != last )
{
   // push_back should be slightly more efficient than this
   v.resize(v.size() + 1);
   v.back() = *first++;
}

Whilst in theory the compiler could optimise the above into a memcpy its unlikely to, at best you'll probably get the methods inlined so that you don't have a function call overhead, it'll still be writing one byte at a time whereas a memcpy will normally use vector instructions to copy multiple bytes at once. 从理论上讲，编译器可以将以上内容优化为一个memcpy ，但最好的情况是，您最多可以内联这些方法，这样就不会产生函数调用开销，但每次仍将写入一个字节，而memcpy通常将使用向量指令一次复制多个字节。

为什么std :: vector <uint8_t> :: insert使用MSVC 2015编译器的速度比std :: copy快5倍？

问题描述

3 个解决方案

解决方案1
5 2018-08-20 11:51:18

解决方案2
3 2018-08-20 12:06:24

解决方案3
1 已采纳 2018-08-20 12:18:33

为什么std :: vector <uint8_t> :: insert使用MSVC 2015编译器的速度比std :: copy快5倍？

问题描述

3 个解决方案

解决方案1 5 2018-08-20 11:51:18

解决方案2 3 2018-08-20 12:06:24

解决方案3 1 已采纳 2018-08-20 12:18:33

解决方案1
5 2018-08-20 11:51:18

解决方案2
3 2018-08-20 12:06:24

解决方案3
1 已采纳 2018-08-20 12:18:33