C ++中的向量和数组

Question

Performance difference between C++ vectors and plain arrays has been extensively discussed, for example here and here . 已经广泛讨论了C ++向量和普通数组之间的性能差异，例如这里和这里。 Usually discussions conclude that vectors and arrays are similar in terms on performance when accessed with the [] operator and the compiler is enabled to inline functions. 通常讨论得出结论，当使用[]运算符访问时，向量和数组在性能方面类似，并且编译器启用了内联函数。 That is why expected but I came through a case where it seems that is not true. 这就是预期的原因，但我遇到的情况似乎并非如此。 The functionality of the lines below is quite simple: a 3D volume is taken and it is swap and applied some kind of 3D little mask a certain number of times. 以下几行的功能非常简单：采用3D体积，交换并应用某种3D小面具一定次数。 Depending on the VERSION macro, volumes will be declared as vectors and accessed through the at operator ( VERSION=2 ), declared as vectors and accessed via [] ( VERSION=1 ) or declared as simple arrays. 根据VERSION宏，卷将被声明为向量并通过at运算符（ VERSION=2 ）访问，声明为向量并通过[] （ VERSION=1 ）访问或声明为简单数组。

#include <vector>
#define NX 100
#define NY 100
#define NZ 100
#define H  1
#define C0 1.5f
#define C1 0.25f
#define T 3000

#if !defined(VERSION) || VERSION > 2 || VERSION < 0 
  #error "Bad version"
#endif 

#if VERSION == 2
  #define AT(_a_,_b_) (_a_.at(_b_))
  typedef std::vector<float> Field;
#endif 

#if VERSION == 1
  #define AT(_a_,_b_) (_a_[_b_])
  typedef std::vector<float> Field;
#endif 

#if VERSION == 0
  #define AT(_a_,_b_) (_a_[_b_])
  typedef float* Field;
#endif 

#include <iostream>
#include <omp.h>

int main(void) {

#if VERSION != 0 
  Field img(NX*NY*NY);
#else
  Field img = new float[NX*NY*NY];
#endif 


  double end, begin;
  begin = omp_get_wtime();  

  const int csize = NZ;
  const int psize = NZ * NX;
  for(int t  = 0; t < T; t++ ) {

    /* Swap the 3D volume and apply the "blurring" coefficients */
    #pragma omp parallel for
    for(int j = H; j < NY-H; j++ ) { 
      for( int i = H; i < NX-H; i++ ) {
        for( int k = H; k < NZ-H; k++ ) {
          int eindex = k+i*NZ+j*NX*NZ;
          AT(img,eindex) = C0 * AT(img,eindex) +
              C1 * (AT(img,eindex - csize) +
                    AT(img,eindex + csize) + 
                    AT(img,eindex - psize) + 
                    AT(img,eindex + psize) );
        }
      }
    }
  }

  end = omp_get_wtime();
  std::cout << "Elapsed "<< (end-begin) <<" s." << std::endl;

 /* Access img field so we force it to be deleted after accouting time */
 #define WHATEVER 12.f
 if( img[ NZ ] == WHATEVER ) { 
   std::cout << "Whatever" << std::endl;
 }


#if VERSION == 0
  delete[] img;
#endif 

}

One would expect code will perform the same with VERSION=1 and VERSION=0 , but the output is as follows: 可以预期代码将执行相同的VERSION=1和VERSION=0 ，但输出如下：

VERSION 2 : Elapsed 6.94905 s. 版本2：经过了6.94905秒。
VERSION 1 : Elapsed 4.08626 s 版本1：经过4.08626秒
VERSION 0 : Elapsed 1.97576 s. 版本0：经历了1.97576秒。

If I compile without OMP (I've got only two cores), I get similar results: 如果我在没有OMP的情况下编译（我只有两个核心），我会得到类似的结果：

VERSION 2 : Elapsed 10.9895 s. 版本2：经过10.9895秒。
VERSION 1 : Elapsed 7.14674 s 版本1：经过7.14674秒
VERSION 0 : Elapsed 3.25336 s. 版本0：经过3.25336秒。

I always compile with GCC 4.6.3 and the compilation options -fopenmp -finline-functions -O3 (I of course remove -fopenmp when I compile without omp) Is there something I do wrong, for example when compiling? 我总是使用GCC 4.6.3和编译选项-fopenmp -finline-functions -O3进行编译（我当然在没有omp的情况下编译时删除了-fopenmp ）我做错了什么，例如编译时？ Or should we really expect that difference between vectors and arrays? 或者我们真的应该期待向量和数组之间的差异吗？

PS: I cannot use std::array because of the compiler, of which I depend, that doesn't support C11 standard. PS：我不能使用std :: array，因为我依赖的编译器不支持C11标准。 With ICC 13.1.2 I get similar behavior. 使用ICC 13.1.2，我得到了类似的行为。

Answer 1

I tried your code, used chrono to count the time. 我尝试了你的代码，用chrono来计算时间。

And I compiled with clang (version 3.5) and libc++. 我用clang（版本3.5）和libc ++编译。

clang++ test.cc -std=c++1y -stdlib=libc++ -lc++abi -finline-functions -O3 clang ++ test.cc -std = c ++ 1y -stdlib = libc ++ -lc ++ abi -finline-functions -O3

The result is exactly same for VERSION 0 and VERSION 1, there's no big difference. 对于VERSION 0和VERSION 1，结果完全相同，没有太大区别。 They are both 3.4 seconds in average (I use virtual machine so it is slower.). 它们平均为3.4秒（我使用的是虚拟机，因此速度较慢）。

Then I tried g++ (version 4.8.1), 然后我尝试了g ++（版本4.8.1），

g++ test.cc -std=c++1y -finline-functions -O3 g ++ test.cc -std = c ++ 1y -finline-functions -O3

The result shows that, for VERSION 0, it is 4.4seconds (roughly), for VERSION 1, it is 5.2 seconds (roughly). 结果显示，对于VERSION 0，它是4.4秒（粗略地），对于VERSION 1，它是5.2秒（粗略地）。

I then, tried clang++ with libstdc++. 然后我用libstdc ++尝试了clang ++。

clang++ test.cc -std=c++11 -finline-functions -O3 clang ++ test.cc -std = c ++ 11 -finline-functions -O3

voila, the result back to 3.4seconds again. 瞧，结果又回到了3.4秒。

So, it's purely the optimization "bug" of g++. 所以，它纯粹是g ++的优化“bug”。

C ++中的向量和数组

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-01-21 09:48:11

C ++中的向量和数组

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-01-21 09:48:11

解决方案1
2 已采纳 2014-01-21 09:48:11