为什么矢量总是比C数组慢，至少在这种情况下？

Question

I am trying to find all the primes not greater than n using the Eratosthenes'Sieve algorithm, and I have the following codes, with the sieve implemented in vector and C array, I have found that almost during all the time, C array is always faster. 我试图使用Eratosthenes'Sieve算法找到所有不大于n的素数，并且我有以下代码，通过在向量和C数组中实现的筛子，我发现几乎在所有时间内，C数组始终是快点。

Using vector: 使用向量：

int countPrimes_vector(int n) {                  
    int res = 0; 
    vector<char>bitmap(n);
    memset(&bitmap[0], '1', bitmap.size() * sizeof( bitmap[0]));
    //vector<bool>bitmap(n, true); Using this one is even slower!!

    for (int i = 2; i<n; ++i){

        if(bitmap[i]=='1')++res;
        if(sqrt(n)>i)
        {
             for(int j = i*i; j < n; j += i) bitmap[j] = '0';
        }
    }

    return res;
}

Using C array: 使用C数组：

int countPrimes_array(int n) {  

    int res = 0; 
    bool * bitmap = new bool[n];
    memset(bitmap, true, sizeof(bool) * n);
    for (int i = 2; i<n; ++i){

        if(bitmap[i])++res;
        if(sqrt(n)>i)
        {
             for(int j = i*i; j < n; j += i) bitmap[j] = false;
        }
    }
    delete []bitmap;
    return res;
}

The test code: 测试代码：

clock_t t;
t = clock();
int a;
for(int i=0; i<10; ++i)a = countPrimes_vector(8000000); 
t = clock() - t;
cout<<"time for vector = "<<t<<endl;

t = clock();
int b;
for(int i=0; i<10; ++i)b = countPrimes_array(8000000); 
t = clock() - t;
cout<<"time for array = "<<t<<endl;

The output: 输出：

 time for vector = 32460000
 time for array = 29840000

I have tested many times, and C array is always faster. 我已经测试了很多次，并且C数组总是更快。 What's the reason behind it? 背后的原因是什么？

I often heard that the performance for vector and C array is the same, vector should be always used for being a standard container. 我经常听说vector和C数组的性能相同，应始终将vector用作标准容器。 Is this statement true, or at least generally speaking ? 这个说法是正确的，或者至少是一般而言吗？ In what cases C array should be preferred? 在什么情况下应该首选C数组？

EDIT: 编辑：

As the following comments suggest, after turning on optimization -O2 or -O3 (originally it was compiled with g++ test.cpp ), the time difference between vector and C array is no longer valid, in some occasions vector is faster than C array. 如以下注释所示，在打开优化-O2或-O3 （最初使用g++ test.cpp编译）后， vector和C数组之间的时间差不再有效，在某些情况下vector的速度比C数组快。

Answer 1

Your comparisons contain inconsistencies which would explain the differences, and another factor could be the result of compiling without sufficient optimization. 您的比较包含不一致之处，这可以解释差异，而另一个因素可能是没有充分优化的编译结果。 Some implementations have a lot of additional code in the debug builds of STL, for instance MSVC does bounds checking on vector element accesses that produce a significant reduction in speed in debug builds. 一些实现在STL的调试版本中有很多其他代码，例如MSVC对矢量元素访问进行边界检查，这会大大降低调试版本的速度。

The following code shows a MUCH closer performance between the two, and the difference is probably just a lack of samples (ideone has a timeout limit of 5s). 以下代码显示了两者之间更接近的性能，并且差异可能只是样本不足（ideone的超时限制为5s）。

#include <vector>
#include <cmath>
#include <cstring>

int countPrimes_vector(int n) {  
    int res = 0; 
    std::vector<bool> bitmap(n, true);
    for (int i = 2; i<n; ++i){
        if(bitmap[i])
          ++res;
        if(sqrt(n)>i)
        {
             for(int j = i*i; j < n; j += i) bitmap[j] = false;
        }
    }
    return res;
}

int countPrimes_carray(int n) {  
    int res = 0; 
    bool* bitmap = new bool[n];
    memset(bitmap, true, sizeof(bool) * n);
    for (int i = 2; i<n; ++i){

        if(bitmap[i])++res;
        if(sqrt(n)>i)
        {
             for(int j = i*i; j < n; j += i) bitmap[j] = false;
        }
    }
    delete []bitmap;
    return res;
}

#include <chrono>
#include <iostream>

using namespace std;

void test(const char* description, int (*fn)(int))
{
    using clock = std::chrono::steady_clock;
    using ms = std::chrono::milliseconds;

    auto start = clock::now();

    int a;
    for(int i=0; i<9; ++i)
        a = countPrimes_vector(8000000); 

    auto end = clock::now();
    auto diff = std::chrono::duration_cast<ms>(end - start);

    std::cout << "time for " << description << " = " << diff.count() << "ms\n";
}

int main()
{
    test("carray", countPrimes_carray);
    test("vector", countPrimes_vector);
}

Live demo: http://ideone.com/0Y9gQx 现场演示： http ： //ideone.com/0Y9gQx

time for carray = 2251ms
time for vector = 2254ms

Although on some runs the carray was 1-2 ms slower. 尽管在某些情况下，carray的速度慢了1-2 ms。 Again, that's insufficient samples on a shared resource. 同样，在共享资源上样本不足。

--- EDIT --- -编辑-

In your main comments you ask "why optimization can make a difference". 在主要评论中，您会问“为什么优化可以有所作为”。

std::vector<bool> v = { 1, 2, 3 };
bool b[] = { 1, 2, 3 };

We have two "array"s of 3 elements, so consider the following: 我们有3个元素的两个“数组”，因此请考虑以下内容：

v[10]; // illegal!
b[10]; // illegal!

Debug versions of STL can often catch this during run time (and with some scenarios, compile time). STL的调试版本通常可以在运行时（在某些情况下为编译时）捕获此错误。 The array access may just result in bad data or a crash. 阵列访问可能只会导致数据损坏或崩溃。

Additionally, the STL is implemented using many small member-function calls to things like size() , and because vector is a class, [] is actually facaded through a function call ( operator[] ). 此外，STL是通过对诸如size()类的许多小型成员函数调用来实现的，并且由于vector是一个类，因此[]实际上是通过函数调用（ operator[] ）来实现的。

The compiler can eliminate many of these, but that's optimization. 编译器可以消除许多此类问题，但这是优化。 If you don't optimize, then something like 如果您没有优化，那么类似

std::vector<int> v;
v[10];

does something roughly like: 做大致类似的事情：

int* data() { return M_.data_; }

v.operator[](size_t idx = 10) {
    if (idx >= this->size()) {
        raise exception("invalid [] access");
    }
    return *(data() + idx);
}

and even though data is an "inlinable" function, to make debugging easier, the unoptimized code leaves it as this. 即使数据是“不可插入的”功能，为了使调试更容易，未优化的代码也将其保留为这样。 When you build with optimization, the compiler recognizes that the implementation of these functions are so trivial it can just substitute their implementations into the call sites, and it quickly winds up simplifying all of the above to a more array-access like operation. 当您进行优化构建时，编译器会意识到这些函数的实现非常琐碎，可以将它们的实现替换为调用站点，然后迅速结束，将上述所有操作简化为更像数组访问的操作。

For example, in the above case, it may first reduce operator[] to 例如，在上述情况下，可以先将operator[]减少为

v.operator[](size_t idx = 10) {
    if (idx >= this->size()) {
        raise exception("invalid [] access");
    }
    return *(M_.data_ + idx);
}

And since compiling without debugging probably removes the bounds check, it becomes 而且由于无需调试就可以删除边界检查，因此它变得

v.operator[](size_t idx = 10) {
    return *(M_.data_ + idx);
}

so now the inliner can reduce 所以现在内衬可以减少

x = v[1];

to 至

x = *(v.M_.data_ + 1); // comparable to v.M_.data_[1];

There is a tiny penalty. 有一个小小的惩罚。 The c-array involves the data block in memory and a single local variable that fits into a register that points to the block, your references are directly relative to that: c数组涉及内存中的数据块和适合指向该块的寄存器的单个局部变量，您的引用与之直接相关：

With a vector, though, you have the vector object which is a pointer to the data, a size and a capacity variable: 但是，对于向量，您具有向量对象，该对象是指向数据，大小和容量变量的指针：

vector<T>  // pseudo code
{
    T* ptr;
    size_t size;
    size_t capacity;
}

If you were counting machine instructions, the vector will have 3 variables to initialize, and manage. 如果要计算机器指令，则向量将具有3个变量进行初始化和管理。

When you write 当你写

x = v[1];

given the above approximation of vector, you are saying something along the lines of: 给定上面的向量近似值，您在说以下几句话：

T* ptr = v.data();
x = ptr[1];

but the compiler is usually smart enough when building with optimization to recognize that it can do the first line before the loop, but this tends to cost a register. 但是编译器在进行优化时通常足够聪明，可以识别出它可以在循环之前执行第一行，但这往往会浪费寄存器的时间。

T* ptr = v.data(); // in debug, function call, otherwise inlined.
for ... {
    x = ptr[1];
}

So you're probably looking at a handful more machine instructions per iteration of your test function, or on a modern processor, maybe a nanosecond or two of extra wall time. 因此，您可能会在每次测试功能迭代中或在现代处理器上查看少量的机器指令，这可能会增加一纳秒或二分之一的额外时间。

为什么矢量总是比C数组慢，至少在这种情况下？

问题描述

1 个解决方案

解决方案1
7 已采纳 2015-06-11 00:16:32

为什么矢量总是比C数组慢，至少在这种情况下？

问题描述

1 个解决方案

解决方案1 7 已采纳 2015-06-11 00:16:32

解决方案1
7 已采纳 2015-06-11 00:16:32