简体   繁体   English

又一个动态阵列与std :: vector,但是

[英]Yet another Dynamic Array vs. std::vector, but

...well, I got strange results! ......好吧,我得到了奇怪的结果!

I was curious about the performance of std::vector vs. that of a dynamic array. 我很好奇std::vector与动态数组的性能。 Seeing as there are many questions on this subject already, I wouldn't have mentioned it if I didn't constantly get these 'contradictory' results: vector<int> is somehow faster than a new int[] ! 看来已经有很多关于这个问题的问题,如果我不经常得到这些“矛盾的”结果,我就不会提到它: vector<int>new int[]更快! I always thought that if there was any performance difference, it would always favor the dynamic array. 我一直认为如果有任何性能差异,它总是喜欢动态数组。 How is this result possible? 这个结果怎么可能?

The code is below: 代码如下:

int numElements = 10000000;
 long j = 0;
 long k = 0;

 vector<int> intVector(numElements);
 int* intArray = new int[numElements]; 

 clock_t start, finish;
 start = clock();

 for (int i = 0; i < numElements; ++i)
  intVector[i] = i;
 for (int i = 0; i < numElements; ++i)
  j += intVector[i];

 finish = clock();
 cout << "j: " << j << endl;
 cout << "Total duration: " << (double) finish - start << " ms." << endl;

 // Test Control.
 start = clock();

 for (int i = 0; i < numElements; ++i)
  intArray[i] = i;
 for (int i = 0; i < numElements; ++i)
  k += intArray[i];

 finish = clock();
 cout << "k: " << k << endl;
 cout << "Total duration: " << (double) finish - start << " ms." << endl;

Optimizations were on, and I separated the for loops within each start/finish block so that I could separately time the initializations of the array/vector (in that case, std::vector<int> and new int[] appear to perform identically). 优化已经开启,我在每个开始/结束块中分离了for循环,这样我就可以单独计算数组/向量的初始化时间(在这种情况下, std::vector<int>new int[]似乎执行相同的操作)。

However, with the above code I constantly get std::vector<int> winning at 26-30 ms versus 36-45 ms for the new int[] . 但是,使用上面的代码我经常得到std::vector<int>26-30 ms获胜而new int[]36-45 ms

Anyone care to explain why the vector is performing better than the dynamic array? 任何人都在关心解释为什么矢量比动态数组表现更好? Both were declared before the timing loops so I expected performance to be about the same. 两者都是在时序循环之前声明的,所以我预计性能大致相同。 Furthermore, I tried the same idea instead using std::vector<int*> and new int*[] and got similar results, with the vector class outperforming the dynamic array, so the same holds for pointers to pointers. 此外,我尝试了相同的想法,而不是使用std::vector<int*>new int*[]并获得了类似的结果, vector类的性能优于动态数组,因此指针指针也是如此。

Thanks for the help. 谢谢您的帮助。

Addendum: Without optimization, std::vector loses out big time to a dynamic array (~ 1,400 ms vs. ~ 80 ms ), to give the expected performance difference, but doesn't this imply that the vector class can somehow be optimized to give better performance than a standard dynamic array? 附录:如果没有优化, std::vector会失去一个动态数组的大时间(~ 1,400 ms vs.~ 80 ms ),以达到预期的性能差异,但这并不意味着矢量类可以某种方式优化到提供比标准动态阵列更好的性能?

My wild guess is that the OS isn't allocating physical memory until it's first accessed. 我的猜测是,操作系统在首次访问之前不会分配物理内存。 The vector constructor will initialise all the elements, so the memory will be allocated by the time you've started timing. vector构造函数将初始化所有元素,因此内存将在您开始计时时分配。 The array memory is uninitialised (and possibly unallocated), so the time for that might include the allocation. 阵列内存未初始化(并且可能未分配),因此其时间可能包括分配。

Try changing the array initialisation to int* intArray = new int[numElements](); 尝试将数组初始化更改为int* intArray = new int[numElements](); to value-initialise its elements, and see if that changes the results. 对其元素进行值初始化,看看是否会改变结果。

For all practical purposes, they're the exact same speed when used this way. 出于所有实际目的,当使用这种方式时,它们的速度完全相同。 vector's operator[] is typically implemented like this [MSVC version]: vector的operator []通常像[MSVC版本]一样实现:

const_reference operator[](size_type _Pos) const
{   // subscript nonmutable sequence
    return (*(_Myfirst + _Pos));
}

... which is the same as: ......这与:

const_reference operator[](size_type _Pos) const
{   // subscript nonmutable sequence
    return _Myfirst[_Pos];
}

Your test is basically just testing your compiler's ability to inline code, and it appears to be doing it nicely here. 您的测试基本上只是测试编译器内联代码的能力,而且它似乎在这里做得很好。

As for the explanation of the differences, any answers you get are generally going to be hypothetical without seeing the disassembly. 至于差异的解释,你得到的任何答案通常都是假设的,没有看到反汇编。 It could have to do with better caching, registers utilized better for the first case (try swapping the order of the tests and see what happens), etc. One thing worth noting is that the vector's memory will be accessed before the test starts with the way it initializes everything to T() in the ctor. 它可能与更好的缓存有关,寄存器更好地用于第一种情况(尝试交换测试的顺序,看看会发生什么),等等。值得注意的是,在测试开始之前,将访问向量的内存。它将所有内容初始化为ctor中的T()。

Unfortunately we can't simply write little micro-tests like these and make general conclusions from them. 不幸的是,我们不能简单地编写这样的小测试,并从中得出一般结论。 We used to be able to do this more before systems and optimizing compilers became so complicated, but now there are far too many variables involved to make meaningful conclusions from anything but real-world tests. 在系统和优化编译器变得如此复杂之前,我们曾经能够做到这一点,但是现在除了现实世界的测试之外,还有太多的变量可以用来做出有意义的结论。

It's for this same reason that we generally expect anyone who is serious about performance to actively profile their code, as things have become far too complicated for people to correctly determine the bottlenecks in their code short of obvious algorithmic inefficiencies (I've often seen even expert programmers who have a far superior understanding of assembly and computer architecture than I do get this wrong when I check their hypotheses with the profiler). 出于同样的原因,我们通常期望任何认真对待性能的人积极地分析他们的代码,因为事情已经变得非常复杂,人们无法正确地确定代码中的瓶颈,而不是明显的算法效率低下(我经常看到专业程序员对装配和计算机体系结构的理解远远超过我在使用剖析器检查他们的假设时所犯的错误。

I just did this experiment. 我刚做了这个实验。 Strange behavior indeed, although I think I figured it out. 确实有奇怪的行为,虽然我认为我弄清楚了。

Repeat your code again. 再次重复您的代码。 That is... 那是...

benchmark vector
benchmark array

benchmark vector
benchmark array

You'll notice that you'll get different numbers the second time. 你会注意到第二次你会得到不同的数字。 My guess? 我猜? Page Faults. 页面错误。 The vector for some reason doesn't cause a page fault, while the array method does. 由于某种原因,向量不会导致页面错误,而数组方法会导致页面错误。 After the pages are loaded, both will run at approximately the same speed (ie: what happens the 2nd time). 页面加载后,两者将以大致相同的速度运行(即:第二次发生的情况)。 Does anyone know how to print the number of page faults in a process so far? 到目前为止,有谁知道如何打印进程中的页面错误数?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM