简体   繁体   English

std :: deque比std :: vector更快插入到最后?

[英]Is std::deque faster than std::vector for inserting at the end?

I started doing comparisons between: 我开始比较:

  • inserting at the front of list 插入列表的前面
  • inserting at the back of a vector 插入向量的背面
  • inserting at the front of a deque 插入一个双端队列的前面

But then I noticed that even on push_back() the deque seemed to be faster. 但后来我注意到即使在push_back() ,deque似乎也更快了。 I must be doing something wrong, I can't believe a more general container would outperform a particular one. 我一定做错了什么 ,我不敢相信一个更普通的容器会胜过某个特定容器。

My code using google benchmark: 我使用谷歌基准测试的代码:

#include "benchmark/benchmark.h"
#include <deque>
#include <vector>

#define NUM_INS 1000

static void BM_InsertVector(benchmark::State& state) {
    std::vector<int> v;
    v.reserve(NUM_INS);
    while (state.KeepRunning()) {
        state.PauseTiming();
        v.clear();
        state.ResumeTiming();
        for (size_t i = 0; i < NUM_INS; i++)
            v.push_back(i);
    }
}
BENCHMARK(BM_InsertVector);

static void BM_InsertDeque(benchmark::State& state) {
    std::deque<int> v;
    while (state.KeepRunning()) {
        state.PauseTiming();
        v.clear();
        state.ResumeTiming();
        for (size_t i = 0; i < NUM_INS; i++)
            v.push_back(i);
    }
}
BENCHMARK(BM_InsertDeque);

BENCHMARK_MAIN();

Results: 结果:

Run on (1 X 2592 MHz CPU )
2016-02-18 14:03:47
Benchmark         Time(ns)    CPU(ns) Iterations
------------------------------------------------
BM_InsertVector       2820       2470     312500                                 
BM_InsertDeque        1872       1563     406977

I notice some differences when playing with the number of elements, but deque always outperforms vector. 在使用元素数量时,我注意到一些差异,但是deque总是优于矢量。

EDIT: compiler: gcc version 5.2.1 compiling with: g++ -O3 -std=c++11 push_front.cpp -lbenchmark -lpthread 编辑:编译器: gcc version 5.2.1编译: g++ -O3 -std=c++11 push_front.cpp -lbenchmark -lpthread

I think the -O3 is actually instrumental; 我认为-O3实际上是工具性的; when I turn it off I get a slightly worse deque performance. 当我关闭它时,我的deque性能会稍差一些

I think the vector is slower because you're calling clear() which, depending on your STL implementation, may be freeing the underlying array storage. 我认为向量较慢,因为你正在调用clear() ,这取决于你的STL实现,可能会释放底层数组存储。

If that's the case, then your reserve() call isn't helping; 如果是这种情况,那么你的reserve()电话没有帮助; and your vector is continuously resizing, which requires every element to be moved to the new, larger, storage. 并且您的矢量不断调整大小,这需要将每个元素移动到新的更大的存储空间。

There are basically 3 sources of cost involved in continuously appending elements to a dynamic container: 连续向动态容器添加元素涉及基本上3个成本来源:

  1. Memory management. 内存管理。
  2. The internal bookkeeping of the container. 容器的内部簿记。
  3. Any operations that need to be performed on the elements themselves. 需要对元素本身执行的任何操作。 Notably; 值得注意的是, any container that invalidates references on insertion is potentially moving/copying elements around. 任何使插入时引用无效的容器都可能会移动/复制元素。

Let's start with 1. vector keeps asking for double the memory, and deque allocates fixed sized chunks ( deque is typically implemented as an array of arrays, with the lower tier arrays being of fixed size). 让我们从1.开始vector继续要求内存加倍,并且deque分配固定大小的块( deque通常实现为数组阵列,较低层数组具有固定大小)。 Asking for more memory may take longer than asking for less, but typically unless your heap is very fragmented asking for a big chunk all at once is the fastest way to get some memory. 要求更多的内存可能需要更长的时间,但通常要求更少,但通常情况下,除非你的堆非常分散,一次性要求一个大块是获得一些内存的最快方法。 It's probably faster to allocate one meg once, then ask for a kilobyte 1000 times. 分配一个meg一次可能更快,然后要求千字节1000次。 So it seems clear that vector will eventually have the advantage here (until the container is so large it's affected by fragmentation). 所以很明显, vector最终会在这里具有优势(直到容器如此之大,它受到碎片的影响)。 However, this isn't eventually: you asked for only 1000 elements. 然而,这最终不是:你只要求1000个元素。 I wrote the following code http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0 . 我写了以下代码http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0 It's not very interesting but it basically uses a trivial allocator that increments a global to see how many allocations are performed. 它不是很有趣,但它基本上使用一个简单的分配器来增加全局,以查看执行了多少分配。

In the course of your benchmark, vector asks for memory 11 times, and deque only 10. deque keeps asking for the same amount, vector asks for doubling amounts. 在你的基准测试过程中, vector要求内存11次,而deque只有deque一直要求相同数量, vector要求加倍量。 As well, vector must be calling free 10 times. 同样, vector必须free调用10次。 And deque 0. This seems like a small win for deque . 并且deque 0.这似乎是deque的小胜利。

For internal bookkeeping, vector has a simpler implementation than deque . 对于内部簿记, vector的实现比deque更简单。 After all, vector is just a dynamic array, and deque is an array of arrays and is strictly more complex. 毕竟, vector只是一个动态数组, deque是一个数组数组,严格来说更复杂。 So this is clearly a win for vector . 所以这显然是vector的胜利。

Finally, elements on the operations themselves. 最后,关于操作本身的元素。 In deque , nothing is ever moved. deque ,没有任何东西被移动。 With vector , every new heap allocation also involves moving all the elements. 使用vector ,每个新的堆分配也涉及移动所有元素。 It's probably optimized to use memcpy for trivial types, but even see, that's 10 calls to memcpy to copy 1, 2, 4, 8 ... 512 integers. 它可能被优化为使用memcpy来处理普通类型,但是甚至可以看到,这是对memcpy的10次调用来复制1,2,4,8 ... 512个整数。 This is clearly a win for deque . 这显然是deque的胜利。

I can speculate that cranking up to O3 allowed very aggressive inlining of a lot of the more complex codepaths in deque , reducing the weight of 2. But obviously, unless you do a much more detailed (very careful!) benchmark, you'll never know for sure. 我可以推测,启动O3可以非常积极地插入deque许多更复杂的代码路径,减轻2的重量。但显然,除非你做更详细(非常小心!)的基准测试,否则你永远不会知道肯定。

Mostly, this post is to show that it's more complex than simply a specialized container vs a more general one. 大多数情况下,这篇文章表明它比简单的专业容器更复杂,而不是更普遍的容器。 I will make a prediction though (put my neck out to be cut off, as it were): if you increase the number of elements by even say a factor of 2 or 4, you will not see deque win anymore. 我会做一个预测(把我的脖子剪掉,就像它一样):如果你增加元素的数量甚至可以说是2或4的因子,你就不会再看到deque win了。 That's because deque will make 2x or 4x as many heap allocations, but vector will only make 1-2 more. 这是因为deque将使堆分配的数量增加2倍或4倍,但是vector只会增加1-2倍。

I may as well note here that deque is actually kind of an odd data structure; 我在这里deque注意, deque实际上是一种奇怪的数据结构; it's theoretically an array of arrays but in many implementations the array is either a certain size, or just one element, whichever is larger. 它理论上是一个数组数组,但在许多实现中,数组要么是一定的大小,要么只是一个元素,无论哪个更大。 Also, some of it's big O guarantees are nonsense. 此外,它的一些大O保证是无稽之谈。 push_back is only fixed constant time, because in C++, only operations on the elements themselves count towards the big O. Otherwise it should be clear, that since it's an array of arrays, the top level array will be proportional in size to the number of elements already stored. push_back只是固定的常量时间,因为在C ++中,只对元素本身的操作计入大O.否则应该清楚,因为它是一个数组数组,顶级数组的大小与数量成正比已存储的元素。 And eventually that top level array runs out of room, and you have to reallocate it, moving O(N) pointers. 最终,顶级阵列的空间不足,你必须重新分配它,移动O(N)指针。 So it's not really O(1) push_back . 所以它不是真的O(1) push_back

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM