简体   繁体   English

静态数组与动态数组的C / C ++性能

[英]C/C++ performance of static arrays vs dynamic arrays

When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? 当性能对应用程序至关重要时,应该考虑是否在堆栈上声明一个数组而不是堆? Allow me to outline why this question has come to mind. 请允许我概述为什么会出现这个问题。

Since arrays in C/C++ are not objects and decay to pointers, the compiler uses the provided index to perform pointer arithmetic to access elements. 由于C / C ++中的数组不是对象并且衰减为指针,因此编译器使用提供的索引来执行指针算法来访问元素。 My understanding is that this procedure differs from a statically declared array to a dynamically declared array when going past the first dimension. 我的理解是,当经过第一个维度时,此过程静态声明的数组不同,是动态声明的数组。

If I were to declare an array on the stack as follows; 如果我要在堆栈上声明一个数组,如下所示;

  int array[2][3] = { 0, 1, 2, 3, 4, 5 }
  //In memory        { row1 } { row2 }

This array would be stored in Row Major format in memory since it is stored in a contiguous block of memory. 该数组将以行主格式存储在内存中,因为它存储在连续的内存块中。 This means when I try to access an element in the array, the compiler must perform some addition and multiplication in order to ascertain the correct location. 这意味着当我尝试访问数组中的元素时,编译器必须执行一些加法和乘法才能确定正确的位置。

So if I were to do the following 所以如果我要做以下事情

  int x = array[1][2]; // x = 5

The compiler would then use this formula where: 然后编译器将使用以下公式:

i = row index j = column index n = size of a single row (here n = 2) i =行索引j =列索引n =单行的大小(此处n = 2)
array = pointer to first element array =指向第一个元素的指针

  *(array + (i*n) + j)
  *(array + (1*2) + 2)  

This means if I were to loop over this array to access each of its elements, an additional multiplication step is performed for each access by index. 这意味着如果我循环遍历此数组以访问其每个元素,则通过索引对每个访问执行额外的乘法步骤。

Now, in an array declared on the heap, the paradigm is different and requires a multi stage solution. 现在,在堆上声明的数组中,范例是不同的,需要一个多阶段解决方案。 Note: I could also use the C++ new operator here, but I believe there is no difference in how the data is represented. 注意:我也可以在这里使用C ++ new运算符,但我相信数据的表示方式没有区别。

  int ** array;
  int rowSize = 2;
  // Create a 2 by 3 2d array on the heap
  array = malloc(2 * sizeof(int*));
  for (int i = 0; i < 2; i++) {
      array[i] = malloc(3 * sizeof(int));
  }

  // Populating the array
  int number = 0;
  for (int i = 0; i < 2; i++) {
      for (int j = 0l j < 3; j++) {
          array[i][j] = number++;
      }
  }

Since the array is now dynamic, its representation is a one dimensional array of one dimensional arrays. 由于数组现在是动态的,因此其表示是一维数组的一维数组。 I will try to draw an ascii picture... 我会尝试绘制ascii图片......

              int *        int int int
int ** array-> [0]          0   1   2
               [1]          3   4   5

This would imply that multiplication is no longer involved right? 这意味着不再涉及乘法吗? If I were to do the following 如果我要做以下事情

int x = array[1][1];

This would then perform indirection/pointer arithmetic on array[1] to access a pointer to the second row and then perform this once again to access the second element. 然后,这将对array [1]执行间接/指针算法以访问指向第二行的指针,然后再次执行此操作以访问第二个元素。 Am I correct in saying this? 我说的是对的吗?

Now that there is some context, back to the question. 现在有一些背景,回到问题。 If I am writing code for an application that requires crisp performance, like a game which has around 0.016 seconds to render a frame, should I think twice about using an array on the stack vs the heap? 如果我正在为需要清晰性能的应用程序编写代码,比如渲染帧大约需要0.016秒的游戏,那么我应该三思而后行使用堆栈中的数组与堆相比? Now I realize there is a one time cost for using malloc or the new operator, but at a certain point (just like Big O analysis) when the data set becomes large, would one be better off iterating through a dynamic array to avoid row major indexing? 现在我意识到使用malloc或new运算符需要一次性成本,但是在某个时刻(就像Big O分析一样)当数据集变大时,最好通过动态数组迭代来避免行主索引?

These will apply to "plain" C (not C++). 这些将适用于“普通”C(不是C ++)。

First let's clear some terminology 首先让我们清楚一些术语

"static" is a keyword in C which will drastically change the way your variable is allocated / accessed if it is applied on variables declared within functions. “static”是C中的关键字,如果将变量应用于函数内声明的变量,它将极大地改变变量的分配/访问方式。

There are 3 places (regarding C) where a variable (including arrays) may sit: 有3个位置(关于C),其中一个变量(包括数组)可能位于:

  • Stack: these are function local variables without static . Stack:这些是没有static函数局部变量。
  • Data section: space is allocated for these when the program starts. 数据部分:程序启动时为这些分配空间。 These are any global variables (be it static or not, there the keyword relates to visibility), and any function local variables declared static . 这些是任何全局变量(无论是否为static变量,关键字与可见性有关),以及任何声明为static函数局部变量。
  • Heap: dynamically allocated memory ( malloc() & free() ) referred by a pointer. 堆:由指针引用的动态分配的内存( malloc()free() )。 You access this data only through pointers. 您只能通过指针访问此数据。

Now let's see how one dimensional arrays are accessed 现在让我们看看如何访问一维数组

If you access an array with a constant index (may be #define d, but not const in plain C), this index can be calculated by the compiler. 如果访问具有常量索引的数组(可能是#define d,但在普通C中不是const ),则可以由编译器计算此索引。 If you have a true array in the Data section , it will be accessed without any indirection. 如果在“ 数据”部分中有一个真实数组,则无需任何间接访问它。 If you have a pointer ( Heap ) or an array on the Stack , an indirection is always necessary. 如果堆栈上有指针( Heap )或数组,则始终需要间接。 So arrays in the Data section with this type of access may be a very little bit faster. 因此,具有此类访问权限的数据部分中的数组可能会快得多。 But this is not a very useful thing which would turn the world. 但这不是一个可以改变世界的非常有用的东西。

If you access an array with an index variable, it essentially always decays to a pointer since the index may change (for example increment in a for loop). 如果访问具有索引变量的数组,则它基本上总是衰减到指针,因为索引可能会更改(例如for循环中的增量)。 The generated code will likely be very similar or even identical for all types here. 对于所有类型,生成的代码可能非常相似甚至相同。

Bring in more dimensions 带来更多尺寸

If you declare a two or more dimensional array, and access it partially or fully by constants, an intelligent compiler may well optimize these constants out as above. 如果声明一个两维或更多维数组,并通过常量部分或完全访问它,智能编译器可能会如上所述优化这些常量。

If you access by indices, note that the memory is linear. 如果您通过索引访问,请注意内存是线性的。 If the later dimensions of a true array are not a multiple of 2, the compiler will need to generate multiplications. 如果真实数组的后续维度不是2的倍数,则编译器将需要生成乘法。 For example in the array int arr[4][12]; 例如在数组int arr[4][12]; the second dimension is 12. If you now access it as arr[i][j] where i and j are index variables, the linear memory has to be indexed as 12 * i + j . 第二个维度是12.如果你现在以arr[i][j]访问它,其中ij是索引变量,线性存储器必须被索引为12 * i + j So the compiler has to generate code to multiply with a constant here. 因此编译器必须生成代码以在此处与常量相乘。 The complexity depends on how "far" the constant is from a power of 2. Here the resulting code will likely look somewhat like calculating (i<<3) + (i<<2) + j to access the element in the array. 复杂性取决于常数与2的幂的“远”程度。这里得到的代码看起来有点像计算(i<<3) + (i<<2) + j来访问数组中的元素。

If you build up the two dimensional "array" from pointers, the size of the dimensions do not matter since there are reference pointers in your structure. 如果从指针构建二维“数组”,则维度的大小无关紧要,因为结构中存在引用指针。 Here if you can write arr[i][j] , that implies you declared it as for example int* arr[4] , and then malloc() ed four chunks of memory of 12 int s each into it. 这里如果你可以编写arr[i][j] ,这意味着你将它声明为例如int* arr[4] ,然后malloc()编辑了四个12 int的内存块。 Note that your four pointers (which the compiler now can use as base) also consume memory which wasn't taken if it was a true array. 请注意,您的四个指针(编译器现在可以用作基础)也会占用内存,如果它是真正的数组则不会占用内存。 Also note that here the generated code will contain a double indirection: First the code loads a pointer by i from arr , then it will load an int from that pointer by j . 还要注意,这里生成的代码将包含一个双重间接:首先,代码从arr加载i的指针,然后它将从该指针加载一个由jint

If the lengths are "far" from powers of 2 (so complex "multiply with constant" codes would have to be generated to access the elements) then using pointers may generate faster access codes. 如果长度与2的幂“相差”(因此必须生成复杂的“乘以常数”代码来访问元素),那么使用指针可以生成更快的访问代码。

As James Kanze mentioned in his answer, in some circumstances the compiler may be able to optimize access for true multi-dimensional arrays. 正如James Kanze在他的回答中提到的,在某些情况下,编译器可能能够优化真正的多维数组的访问。 This kind of optimization is impossible for arrays composed from pointers as the "array" is actually not a linear chunk of memory that case. 对于由指针组成的数组,这种优化是不可能的,因为“数组”实际上不是这种情况下的线性内存块。

Locality matters 地方很重要

If you are developing for usual desktop / mobile architectures (Intel / ARM 32 / 64 bit processors) locality also matters. 如果您正在开发通常的桌面/移动架构(Intel / ARM 32/64位处理器),那么地方也很重要。 That is what is likely sitting in the cache. 这可能就是坐在缓存中。 If your variables are already in the cache for some reason, they will be accessed faster. 如果您的变量由于某种原因已经在缓存中,则可以更快地访问它们。

In the term of locality Stack is always the winner since the Stack is so frequently used that it is very likely to always sit in the cache. 在局部性方面, Stack总是赢家,因为Stack经常被使用,因此很可能总是位于缓存中。 So small arrays are best put in there. 所以小阵列最好放在那里。

Using true multi-dimensional arrays instead of composing one from pointers may also help on this ground since a true array is always a linear chunk of memory, so it usually might need fewer blocks of cache to load in. A scattered pointer composition (that is if using separately malloc() ed chunks) to the contrary might need more cache blocks, and may rise cache line conflicts depending on how the chunks physically ended up on the heap. 使用真正的多维数组而不是从指针组成一个数组也可能有助于此,因为真正的数组总是一个线性的内存块,所以它通常可能需要更少的缓存块来加载。一个分散的指针组合(即如果单独使用malloc() ed chunks)相反可能需要更多的缓存块,并且可能会升级缓存行冲突,具体取决于块在堆上的物理结束方式。

As to which choice provides better performance, then the answer will largely depend on your specific circumstances. 至于哪种选择提供更好的性能,那么答案在很大程度上取决于您的具体情况。 The only way to know if one way is better or if they are roughly equivalent is to measure the performance of your application. 了解一种方法是否更好或者它们大致相同的唯一方法是衡量应用程序的性能。

Some things that would be a factor are: how often you do it, the actual size of the arrays/data, how much memory your system has, and how well your system manages memory. 一些因素是:你经常这样做,数组/数据的实际大小,你的系统有多少内存,以及你的系统管理内存的程度。

If you have the luxury of being able to choose between the two choices, it must mean the sizes are already nailed up. 如果您能够在两种选择之间做出选择,那么它必须意味着尺寸已经确定。 Then, you do not need the multiple allocation scheme that you illustrated. 然后,您不需要您说明的多重分配方案。 You can perform a single dynamic allocation of your 2D array. 您可以执行2D阵列的单个动态分配。 In C: 在C:

int (*array)[COLUMNS];
array = malloc(ROWS * sizeof(*array));

In C++: 在C ++中:

std::vector<std::array<int, COLUMNS>> array(ROWS);

As long as the COLUMNS is nailed down, you can perform a single allocation to obtain your 2D array. 只要COLUMNS被固定,您就可以执行单个分配来获取2D阵列。 If neither are nailed down, then you don't really have the choice of using a static array anyway. 如果两者都没有被钉死,那么你无论如何都无法选择使用静态数组。

The usual way of implementing a 2 dimensional array in C++ would be to wrap it in a class, using std::vector<int> , and have class accessors which calculate the index. 在C ++中实现二维数组的通常方法是使用std::vector<int>将它包装在一个类中,并具有计算索引的类访问器。 However: 然而:

Any questions concerning optimization can only be answered by measuring, and even then, they are only valid for the compiler you are using, on the machine on which you do the measurements. 有关优化的任何问题都只能通过测量来回答,即使这样,它们仅对您正在使用的编译器有效,也可以在您进行测量的机器上进行。

If you write: 如果你写:

int array[2][3] = { ... };

and then something like: 然后像:

for ( int i = 0; i != 2; ++ i ) {
    for ( int j = 0; j != 3; ++ j ) {
        //  do something with array[i][j]...
    }
}

It's hard to imagine a compiler which doesn't actually generate something along the lines of: 很难想象一个编译器实际上不会生成以下内容:

for ( int* p = array, p != array + whatever; ++ p ) {
    //  do something with *p
}

This is one of the most fundamental optimizations around, and has been for at least 30 years. 这是最基本的优化之一,并且已经持续了至少30年。

If you dynamically allocate as you propose, the compiler will not be able to apply this optimization. 如果按照建议动态分配,编译器将无法应用此优化。 And even for a single access: the matrix has poorer locality, and requires more memory accesses, so would likely be less performant. 甚至对于单个访问:矩阵具有较差的局部性,并且需要更多的存储器访问,因此可能性能较差。

If you're in C++, you would normally write a Matrix class, using std::vector<int> for the memory, and calculating the indexes explicitly using multiplication. 如果您使用的是C ++,则通常会编写一个Matrix类,使用std::vector<int>作为内存,并使用乘法显式计算索引。 (The improved locality will probably result in better performance, despite the multiplication.) This could make it more difficult for the compiler to do the above optimization, but if this turns out to be an issue, you can always provide specialized iterators for handling this one particular case. (改进的局部性可能会导致更好的性能,尽管有乘法。)这可能使编译器更难以进行上述优化,但如果这是一个问题,你总是可以提供专门的迭代器来处理这个一个特例。 You end up with more readable and more flexible code (eg the dimensions don't have to be constant), at little or no loss of performance. 您最终会获得更具可读性和更灵活的代码(例如,维度不必保持不变),几乎不会损失性能。

Often there is a trade off between memory consumption and speed. 通常在内存消耗和速度之间存在折衷。 Empirically, I have witnessed that creating array on stack is faster than allocation on heap. 根据经验,我亲眼目睹了在堆栈上创建数组比在堆上分配更快。 As the array size increases this becomes more apparent. 随着阵列大小的增加,这变得更加明显。

You can always decrease the memory consumption. 您始终可以减少内存消耗。 For example you can use short or char instead of int etc. 例如,您可以使用short或char而不是int等。

As the array size increases, especially with the use of realloc, there might be a lot more page replacement (up and down) to maintain the contiguous location of items. 随着阵列大小的增加,特别是使用realloc,可能会有更多的页面替换(向上和向下)来维护项目的连续位置。

You should also consider that there is a lower limit for the size of the things you can store in stack, for heap this limit is higher but as I told with the cost of performance. 你还应该考虑堆栈中可以存储的东西的大小有一个下限,对于堆这个限制更高,但正如我所说的那样,性能成本。

Stalk memory allocation offers quicker access of data than the Heap. 与堆相比,Stalk内存分配可以更快地访问数据。 The CPU would look for the address in the cache if it does not have it, if it does not find the address in the cache then it would look up in the main memory. 如果没有它,CPU会在缓存中查找地址,如果它没有在缓存中找到地址,那么它将在主存储器中查找。 Stalk is a preferred location after cache. Stalk是缓存后的首选位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM