简体   繁体   English

C ++数组与向量

[英]C++ Array vs vector

when using C++ vector, time spent is 718 milliseconds, while when I use Array, time is almost 0 milliseconds. 当使用C ++向量时,花费的时间是718毫秒,而当我使用Array时,时间几乎是0毫秒。

Why so much performance difference? 为什么这么大的性能差异?

int _tmain(int argc, _TCHAR* argv[])
{
const int size = 10000; 
clock_t start, end; 
start = clock();
vector<int> v(size*size); 
for(int i = 0; i < size; i++)
{  
    for(int j = 0; j < size; j++)
    {   
        v[i*size+j] = 1;  
    } 
} 
end = clock();
cout<< (end - start)
    <<" milliseconds."<<endl; // 718 milliseconds

int f = 0;
start = clock(); 
int arr[size*size]; 
for(int i = 0; i < size; i++)
{  
    for(int j = 0; j < size; j++)
    {   
        arr[i*size+j] = 1;  
    } 
} 
end = clock();
cout<< ( end - start)
    <<" milliseconds."<<endl; // 0 milliseconds
return 0;
}

Your array arr is allocated on the stack, ie, the compiler has calculated the necessary space at compile time. 您的数组arr在堆栈上分配,即编译器在编译时计算了必要的空间。 At the beginning of the method, the compiler will insert an assembler statement like 在该方法的开头,编译器将插入一个汇编语句

sub esp, 10000*10000*sizeof(int)

which means the stack pointer ( esp ) is decreased by 10000 * 10000 * sizeof(int) bytes to make room for an array of 10000 2 integers. 这意味着堆栈指针( esp )减少10000 * 10000 * sizeof(int)字节,以便为10000 2个整数的数组腾出空间。 This operation is almost instant. 这个操作几乎是即时的。

The vector is heap allocated and heap allocation is much more expensive. 向量是堆分配的,堆分配要贵得多。 When the vector allocates the required memory, it has to ask the operating system for a contiguous chunk of memory and the operating system will have to perform significant work to find this chunk of memory. 当向量分配所需的内存时,它必须向操作系统询问连续的内存块,并且操作系统必须执行大量工作才能找到这块内存。

As Andreas says in the comments, all your time is spent in this line: 正如安德烈亚斯在评论中所说的那样,你所有的时间都花在这一行上:

vector<int> v(size*size); 

Accessing the vector inside the loop is just as fast as for the array. 访问循环内的向量与数组一样快。

For an additional overview see eg 有关其他概述,请参阅例如

Edit: 编辑:

After all the comments about performance optimizations and compiler settings, I did some measurements this morning. 在关于性能优化和编译器设置的所有评论之后,我今天早上做了一些测量。 I had to set size=3000 so I did my measurements with roughly a tenth of the original entries. 我必须设置size=3000所以我用大约十分之一的原始条目进行测量。 All measurements performed on a 2.66 GHz Xeon: 在2.66 GHz Xeon上执行的所有测量:

  1. With debug settings in Visual Studio 2008 (no optimization, runtime checks, and debug runtime) the vector test took 920 ms compared to 0 ms for the array test. 使用Visual Studio 2008中的调试设置(无优化,运行时检查和调试运行时),矢量测试花费了920 ms,而阵列测试则为0 ms。

    98,48 % of the total time was spent in vector::operator[] , ie, the time was indeed spent on the runtime checks. vector::operator[]花费了98,48%的总时间,即确实花费在运行时检查上的时间。

  2. With full optimization, the vector test needed 56 ms (with a tenth of the original number of entries) compared to 0 ms for the array. 通过完全优化,矢量测试需要56 ms(具有原始条目数的十分之一),而阵列的测试时间为0 ms。

    The vector ctor required 61,72 % of the total application running time. vector ctor需要61.72%的应用程序运行时间。

So I guess everybody is right depending on the compiler settings used. 所以我想根据所使用的编译器设置,每个人都是正确的。 The OP's timing suggests an optimized build or an STL without runtime checks. OP的时序表明优化的构建或STL没有运行时检查。

As always, the morale is: profile first, optimize second. 与往常一样,士气是:配置文件优先,优化第二。

If you are compiling this with a Microsoft compiler, to make it a fair comparison you need to switch off iterator security checks and iterator debugging, by defining _SECURE_SCL=0 and _HAS_ITERATOR_DEBUGGING=0. 如果您使用Microsoft编译器进行编译,为了使其公平比较,您需要通过定义_SECURE_SCL = 0和_HAS_ITERATOR_DEBUGGING = 0来关闭迭代器安全检查和迭代器调试。

Secondly, the constructor you are using initialises each vector value with zero, and you are not memsetting the array to zero before filling it. 其次,您使用的构造函数将每个向量值初始化为零,并且在填充之前您不会将数组memset为零。 So you are traversing the vector twice. 所以你要遍历矢量两次。

Try: 尝试:

vector<int> v; 
v.reserve(size*size);

Change assignment to eg. 将作业更改为例如。 arr[i*size+j] = i*j , or some other non-constant expression. arr[i*size+j] = i*j ,或其他一些非常量表达式。 I think compiler optimizes away whole loop, as assigned values are never used, or replaces array with some precalculated values, so that loop isn't even executed and you get 0 milliseconds. 我认为编译器优化了整个循环,因为从不使用赋值,或者用一些预先计算的值替换数组,因此循环甚至不执行,你得到0毫秒。

Having changed 1 to i*j , i get the same timings for both vector and array, unless pass -O1 flag to gcc, then in both cases I get 0 milliseconds. 将1更改为i*ji得到了向量和数组的相同时序,除非将-O1标志传递给gcc,然后在两种情况下我得到0毫秒。

So, first of all, double-check whether your loops are actually executed. 因此,首先,仔细检查您的循环是否实际执行。

To get a fair comparison I think something like the following should be suitable: 为了得到公平的比较,我认为以下内容应该是合适的:

#include <sys/time.h>
#include <vector>
#include <iostream>
#include <algorithm>
#include <numeric>


int main()
{
  static size_t const size = 7e6;

  timeval start, end;
  int sum;

  gettimeofday(&start, 0);
  {
    std::vector<int> v(size, 1);
    sum = std::accumulate(v.begin(), v.end(), 0);
  }
  gettimeofday(&end, 0);

  std::cout << "= vector =" << std::endl
        << "(" << end.tv_sec - start.tv_sec
        << " s, " << end.tv_usec - start.tv_usec
        << " us)" << std::endl
        << "sum = " << sum << std::endl << std::endl;

  gettimeofday(&start, 0);
  int * const arr =  new int[size];
  std::fill(arr, arr + size, 1);
  sum = std::accumulate(arr, arr + size, 0);
  delete [] arr;
  gettimeofday(&end, 0);

  std::cout << "= Simple array =" << std::endl
        << "(" << end.tv_sec - start.tv_sec
        << " s, " << end.tv_usec - start.tv_usec
        << " us)" << std::endl
        << "sum = " << sum << std::endl << std::endl;
}

In both cases, dynamic allocation and deallocation is performed, as well as accesses to elements. 在这两种情况下,都会执行动态分配和释放,以及对元素的访问。

On my Linux box: 在我的Linux机器上:

$ g++ -O2 foo.cpp 
$ ./a.out 
= vector =
(0 s, 21085 us)
sum = 7000000

= Simple array =
(0 s, 21148 us)
sum = 7000000

Both the std::vector<> and array cases have comparable performance. std::vector<>和数组案例都具有可比性。 The point is that std::vector<> can be just as fast as a simple array if your code is structured appropriately. 关键是如果你的代码结构合理, std::vector<>可以和简单数组一样快。


On a related note switching off optimization makes a huge difference in this case: 在相关的说明中,关闭优化会在这种情况下产生巨大的差异:

$ g++ foo.cpp 
$ ./a.out 
= vector =
(0 s, 120357 us)
sum = 7000000

= Simple array =
(0 s, 60569 us)
sum = 7000000

Many of the optimization assertions made by folks like Neil and jalf are entirely correct. Neil和jalf等人提出的许多优化断言完全正确。

HTH! HTH!

EDIT : Corrected code to force vector destruction to be included in time measurement. 编辑 :更正代码以强制矢量销毁包含在时间测量中。

You are probably using VC++, in which case by default standard library components perform many checks at run-time (eg whether index is in range). 您可能正在使用VC ++,在这种情况下,默认情况下,标准库组件在运行时执行许多检查(例如,索引是否在范围内)。 These checks can be turned off by defining some macros as 0 (I think _SECURE_SCL ). 通过将一些宏定义为0(我认为_SECURE_SCL )可以关闭这些检查。

Another thing is that I can't even run your code as is: the automatic array is way too large for the stack. 另一件事是我甚至无法按原样运行你的代码:自动数组对于堆栈来说太大了。 When I make it global, then with MingW 3.5 the times I get are 627 ms for the vector and 26875 ms (!!) for the array, which indicates there are really big problems with an array of this size. 当我使它全局化时,那么使用MingW 3.5,我得到的时间是向量627 ms和数组26875 ms(!!),这表明这个大小的数组确实存在很大的问题。

As to this particular operation (filling with value 1), you could use the vector's constructor: 至于这个特定的操作(填充值1),你可以使用vector的构造函数:

std::vector<int> v(size * size, 1);

and the fill algorithm for the array: 和数组的填充算法:

std::fill(arr, arr + size * size, 1);

Two things. 两件事情。 One, operator[] is much slower for vector. 一,运算符[]对于向量来说要慢得多。 Two, vector in most implementations will behave weird at times when you add in one element at a time. 二,当你一次添加一个元素时,大多数实现中的向量都会表现得很奇怪。 I don't mean just that it allocates more memory but it does some genuinely bizarre things at times. 我并不仅仅意味着它会分配更多的内存,但它有时会做一些真正奇怪的事情。

The first one is the main issue. 第一个是主要问题。 For a mere million bytes, even reallocating the memory a dozen times should not take long (it won't do it on every added element). 对于仅仅百万字节,即使重新分配内存十几次也不应该花费很长时间(它不会在每个添加的元素上执行)。

In my experiments, preallocating doesn't change its slowness much. 在我的实验中,预分配并没有太多改变它的缓慢程度。 When the contents are actual objects it basically grinds to a halt if you try to do something simple like sort it. 当内容是实际的对象时,如果你尝试做一些简单的事情,比如对它进行排序,它基本上会停止。

Conclusion, don't use stl or mfc vectors for anything large or computation heavy. 结论,不要将stl或mfc向量用于任何大的或计算重的东西。 They are implemented poorly/slowly and cause lots of memory fragmentation. 它们执行得很慢/很慢并导致大量内存碎片。

When you declare the array, it lives in the stack (or in static memory zone), which it's very fast, but can't increase its size. 声明数组时,它存在于堆栈(或静态内存区域)中,它非常快,但不能增加它的大小。

When you declare the vector, it assign dynamic memory, which it's not so fast, but is more flexible in the memory allocation, so you can change the size and not dimension it to the maximum size. 声明向量时,它会分配动态内存,它不是那么快,但在内存分配方面更灵活,因此您可以更改大小而不是将其标注为最大大小。

When profiling code, make sure you are comparing similar things. 分析代码时,请确保您正在比较类似的东西。

vector<int> v(size*size); 

initializes each element in the vector, 初始化向量中的每个元素,

int arr[size*size]; 

doesn't. 没有。 Try 尝试

int arr[size * size];
memset( arr, 0, size * size );

and measure again... 并再次测量......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM