[英]Is std::deque faster than std::vector for inserting at the end?
I started doing comparisons between: 我开始比较:
But then I noticed that even on push_back()
the deque seemed to be faster. 但后来我注意到即使在
push_back()
,deque似乎也更快了。 I must be doing something wrong, I can't believe a more general container would outperform a particular one. 我一定做错了什么 ,我不敢相信一个更普通的容器会胜过某个特定容器。
My code using google benchmark: 我使用谷歌基准测试的代码:
#include "benchmark/benchmark.h"
#include <deque>
#include <vector>
#define NUM_INS 1000
static void BM_InsertVector(benchmark::State& state) {
std::vector<int> v;
v.reserve(NUM_INS);
while (state.KeepRunning()) {
state.PauseTiming();
v.clear();
state.ResumeTiming();
for (size_t i = 0; i < NUM_INS; i++)
v.push_back(i);
}
}
BENCHMARK(BM_InsertVector);
static void BM_InsertDeque(benchmark::State& state) {
std::deque<int> v;
while (state.KeepRunning()) {
state.PauseTiming();
v.clear();
state.ResumeTiming();
for (size_t i = 0; i < NUM_INS; i++)
v.push_back(i);
}
}
BENCHMARK(BM_InsertDeque);
BENCHMARK_MAIN();
Results: 结果:
Run on (1 X 2592 MHz CPU )
2016-02-18 14:03:47
Benchmark Time(ns) CPU(ns) Iterations
------------------------------------------------
BM_InsertVector 2820 2470 312500
BM_InsertDeque 1872 1563 406977
I notice some differences when playing with the number of elements, but deque always outperforms vector. 在使用元素数量时,我注意到一些差异,但是deque总是优于矢量。
EDIT: compiler: gcc version 5.2.1
compiling with: g++ -O3 -std=c++11 push_front.cpp -lbenchmark -lpthread
编辑:编译器:
gcc version 5.2.1
编译: g++ -O3 -std=c++11 push_front.cpp -lbenchmark -lpthread
I think the -O3
is actually instrumental; 我认为
-O3
实际上是工具性的; when I turn it off I get a slightly worse deque performance. 当我关闭它时,我的deque性能会稍差一些 。
I think the vector is slower because you're calling clear()
which, depending on your STL implementation, may be freeing the underlying array storage. 我认为向量较慢,因为你正在调用
clear()
,这取决于你的STL实现,可能会释放底层数组存储。
If that's the case, then your reserve()
call isn't helping; 如果是这种情况,那么你的
reserve()
电话没有帮助; and your vector is continuously resizing, which requires every element to be moved to the new, larger, storage. 并且您的矢量不断调整大小,这需要将每个元素移动到新的更大的存储空间。
There are basically 3 sources of cost involved in continuously appending elements to a dynamic container: 连续向动态容器添加元素涉及基本上3个成本来源:
Let's start with 1. vector
keeps asking for double the memory, and deque
allocates fixed sized chunks ( deque
is typically implemented as an array of arrays, with the lower tier arrays being of fixed size). 让我们从1.开始
vector
继续要求内存加倍,并且deque
分配固定大小的块( deque
通常实现为数组阵列,较低层数组具有固定大小)。 Asking for more memory may take longer than asking for less, but typically unless your heap is very fragmented asking for a big chunk all at once is the fastest way to get some memory. 要求更多的内存可能需要更长的时间,但通常要求更少,但通常情况下,除非你的堆非常分散,一次性要求一个大块是获得一些内存的最快方法。 It's probably faster to allocate one meg once, then ask for a kilobyte 1000 times.
分配一个meg一次可能更快,然后要求千字节1000次。 So it seems clear that
vector
will eventually have the advantage here (until the container is so large it's affected by fragmentation). 所以很明显,
vector
最终会在这里具有优势(直到容器如此之大,它受到碎片的影响)。 However, this isn't eventually: you asked for only 1000 elements. 然而,这最终不是:你只要求1000个元素。 I wrote the following code http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0 .
我写了以下代码http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0 。 It's not very interesting but it basically uses a trivial allocator that increments a global to see how many allocations are performed.
它不是很有趣,但它基本上使用一个简单的分配器来增加全局,以查看执行了多少分配。
In the course of your benchmark, vector
asks for memory 11 times, and deque
only 10. deque
keeps asking for the same amount, vector
asks for doubling amounts. 在你的基准测试过程中,
vector
要求内存11次,而deque
只有deque
一直要求相同数量, vector
要求加倍量。 As well, vector
must be calling free
10 times. 同样,
vector
必须free
调用10次。 And deque
0. This seems like a small win for deque
. 并且
deque
0.这似乎是deque
的小胜利。
For internal bookkeeping, vector
has a simpler implementation than deque
. 对于内部簿记,
vector
的实现比deque
更简单。 After all, vector
is just a dynamic array, and deque
is an array of arrays and is strictly more complex. 毕竟,
vector
只是一个动态数组, deque
是一个数组数组,严格来说更复杂。 So this is clearly a win for vector
. 所以这显然是
vector
的胜利。
Finally, elements on the operations themselves. 最后,关于操作本身的元素。 In
deque
, nothing is ever moved. 在
deque
,没有任何东西被移动。 With vector
, every new heap allocation also involves moving all the elements. 使用
vector
,每个新的堆分配也涉及移动所有元素。 It's probably optimized to use memcpy
for trivial types, but even see, that's 10 calls to memcpy
to copy 1, 2, 4, 8 ... 512 integers. 它可能被优化为使用
memcpy
来处理普通类型,但是甚至可以看到,这是对memcpy
的10次调用来复制1,2,4,8 ... 512个整数。 This is clearly a win for deque
. 这显然是
deque
的胜利。
I can speculate that cranking up to O3
allowed very aggressive inlining of a lot of the more complex codepaths in deque
, reducing the weight of 2. But obviously, unless you do a much more detailed (very careful!) benchmark, you'll never know for sure. 我可以推测,启动
O3
可以非常积极地插入deque
许多更复杂的代码路径,减轻2的重量。但显然,除非你做更详细(非常小心!)的基准测试,否则你永远不会知道肯定。
Mostly, this post is to show that it's more complex than simply a specialized container vs a more general one. 大多数情况下,这篇文章表明它比简单的专业容器更复杂,而不是更普遍的容器。 I will make a prediction though (put my neck out to be cut off, as it were): if you increase the number of elements by even say a factor of 2 or 4, you will not see
deque
win anymore. 我会做一个预测(把我的脖子剪掉,就像它一样):如果你增加元素的数量甚至可以说是2或4的因子,你就不会再看到
deque
win了。 That's because deque
will make 2x or 4x as many heap allocations, but vector will only make 1-2 more. 这是因为
deque
将使堆分配的数量增加2倍或4倍,但是vector只会增加1-2倍。
I may as well note here that deque
is actually kind of an odd data structure; 我在这里
deque
注意, deque
实际上是一种奇怪的数据结构; it's theoretically an array of arrays but in many implementations the array is either a certain size, or just one element, whichever is larger. 它理论上是一个数组数组,但在许多实现中,数组要么是一定的大小,要么只是一个元素,无论哪个更大。 Also, some of it's big O guarantees are nonsense.
此外,它的一些大O保证是无稽之谈。
push_back
is only fixed constant time, because in C++, only operations on the elements themselves count towards the big O. Otherwise it should be clear, that since it's an array of arrays, the top level array will be proportional in size to the number of elements already stored. push_back
只是固定的常量时间,因为在C ++中,只对元素本身的操作计入大O.否则应该清楚,因为它是一个数组数组,顶级数组的大小与数量成正比已存储的元素。 And eventually that top level array runs out of room, and you have to reallocate it, moving O(N) pointers. 最终,顶级阵列的空间不足,你必须重新分配它,移动O(N)指针。 So it's not really O(1)
push_back
. 所以它不是真的O(1)
push_back
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.