简体   繁体   English

C ++,实例化std :: vector时性能不佳

[英]C++, bad performance when instantiating a std::vector

I have a question regarding the instantiation of std::vector. 我有一个关于std :: vector的实例化的问题。 I compare instantiation of an std::vector and a dynamic allocation of an array of the same size. 我比较了std :: vector的实例化和相同大小的数组的动态分配。 I was expecting that the instantiation of the std::vector would take a little bit longer but I have a huge difference performance. 我期望std :: vector的实例化会花费更长的时间,但是我在性能上有很大的不同。

For the array I have 53 us For the std::vector I have 4338 us 对于数组,我有53 us对于std :: vector,我有4338 us

my code: 我的代码:

#include <chrono>
#include <vector>
#include <iostream>

int main() {
    unsigned int NbItem = 1000000 ;
    std::chrono::time_point<std::chrono::system_clock> start, middle ,end;
    start = std::chrono::system_clock::now() ;
    float * aMallocArea = (float *)calloc(sizeof(float)*NbItem,0) ;
    middle = std::chrono::system_clock::now() ;
    std::vector<float> aNewArea ;
    middle = std::chrono::system_clock::now() ;
    aNewArea.resize(NbItem) ;
    //float * aMallocArea2 = new float[NbItem];
    end = std::chrono::system_clock::now() ;
    std::chrono::duration<double> elapsed_middle = middle-start;
    std::chrono::duration<double> elapsed_end = end-middle;
    std::cout << "ElapsedTime CPU  = " << elapsed_middle.count()*1000000 << " (us) " << std::endl ;
    std::cout << "ElapsedTime CPU  = " << elapsed_end.count()*1000000 << " (us) " << std::endl ;
    free(aMallocArea) ;
    return 0;
}

Even if I create a vector of size 0 I have this difference. 即使创建大小为0的向量,我也有这种差异。 Do you know why I have such bad performance when I am instantiating a std::vector ? 您知道为什么在实例化std :: vector时会有如此差的性能吗? Do you know how to improve this (I tried to use compilation option -O3 but it does not give outstanding result). 您是否知道如何改善此问题(我尝试使用编译选项-O3,但未给出出色的结果)。

Compilation line: g++ --std=c++11 -o test ./src/test.cpp 编译行:g ++ --std = c ++ 11 -o test ./src/test.cpp

compilator version: g++ --version g++ (Debian 4.7.2-5) 4.7.2 Copyright (C) 2012 Free Software Foundation, Inc. This is free software; 编译器版本:g ++ --version g ++(Debian 4.7.2-5)4.7.2版权所有(C)2012自由软件基金会,公司。 see the source for copying conditions. 请参阅复制条件的来源。 There is NO warranty; 没有保修; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 甚至不是出于适销性或针对特定目的的适用性。

Do you realize that this: 您是否意识到:

float * aMallocArea = (float *)calloc(sizeof(float)*NbItem, 0);

means "Allocate sizeof(float)*NbItem items which have the size of zero"? 意味着“分配sizeof(float)*NbItem项的大小为零”? This means that the call performs an allocation of zero bytes. 这意味着该调用执行零字节分配。

Even once you do correct this, the calloc form will be much faster in many cases. 即使您更正了此问题,在许多情况下, calloc表单也会更快。 calloc implementations are capable of "reserving" a memory domain and returning a pointer. calloc实现能够“保留”内存域并返回指针。 When you access the memory, the OS maps the virtual memory. 当您访问内存时,操作系统会映射虚拟内存。

A vector on the other hand, actually goes through and initializes/constructs its elements. 另一方面,向量实际上会经过并初始化/构造其元素。 No implementation I know of checks to see that a) the type is POD, b) memory is zero, and c) that the allocator returns zeroed memory. 我知道没有实现可以检查a)类型为POD,b)内存为零以及c)分配器返回零内存的情况。 So this initialization process can cost quite a bit, compared to calloc . 因此,与calloc相比,此初始化过程可能要花很多钱。

So the "C" version does next to nothing (if you fix your program), and the "C++" version goes through, initializes every element, and touches all the memory in the allocation. 因此,“ C”版本几乎什么也不做(如果您修复程序),“ C ++”版本可以通过,初始化每个元素并触摸分配中的所有内存。 It will be much slower. 它将慢很多。

That is very rarely a good reason to favor the C version, even where performance matters. 即使在性能很重要的情况下,也很少有理由支持C版本。 In practice, you should only allocate memory you actually need. 实际上,您应该只分配实际需要的内存。 Once you start using the memory for something, the times will even out (eg in the C version, it will take time to map the memory when you access it later on). 一旦你开始使用的东西内存,时间将拉平(在C版本例如,它需要一定的时间,当您日后访问来映射内存)。 If you were to create a second timed test which (say) computed the average of the arrays' elements, the C++ version would likely be faster on your implementation because the memory is already mapped and initialized, whereas the C version would perform mapping and initialization as you read the memory. 如果您要创建第二个计时测试,(说)计算的阵列元素的平均,C ++版本可能会更快您的实现,因为内存已被映射和初始化,而C版本将执行映射和初始化当您读取内存时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM