简体   繁体   English

在c / c ++中实现2d数组的数据局部性

[英]data locality for implementing 2d array in c/c++

Long time ago, inspired by "Numerical recipes in C", I started to use the following construct for storing matrices (2D-arrays). 很久以前,受“ C中的数字配方”的启发,我开始使用以下结构来存储矩阵(2D数组)。

double **allocate_matrix(int NumRows, int NumCol)
{
  double **x;
  int i;

  x = (double **)malloc(NumRows * sizeof(double *));
  for (i = 0; i < NumRows; ++i) x[i] = (double *)calloc(NumCol, sizeof(double));
  return x;
}

double **x = allocate_matrix(1000,2000);
x[m][n] = ...;

But recently noticed that many people implement matrices as follows 但是最近发现许多人实现矩阵如下

double *x = (double *)malloc(NumRows * NumCols * sizeof(double));
x[NumCol * m + n] = ...;

From the locality point of view the second method seems perfect, but has awful readability... So I started to wonder, is my first method with storing auxiliary array or **double pointers really bad or the compiler will optimize it eventually such that it will be more or less equivalent in performance to the second method? 从局部角度来看,第二种方法似乎是完美的,但可读性很差……所以我开始怀疑,存储辅助数组或**double指针的第一种方法是否真的不好,否则编译器最终会对其进行优化,以至于在性能上会与第二种方法差不多吗? I am suspicious because I think that in the first method two jumps are made when accessing the value, x[m] and then x[m][n] and there is a chance that each time the CPU will load first the x array and then x[m] array. 我很怀疑,因为我认为在第一种方法中,访问值x[m]然后x[m][n]时会发生两次跳转,并且每次CPU都将首先加载x数组和然后是x[m]数组。

ps do not worry about extra memory for storing **double , for large matrices it is just a small percentage. ps不必担心用于存储**double额外内存,对于大型矩阵,这只是一个很小的百分比。

PPS since many people did not understand my question very well, I will try to re-shape it: do I understand right that the first method is kind of locality-hell, when each time x[m][n] is accessed first x array will be loaded into CPU cache and then x[m] array will be loaded thus making each access at the speed of talking to RAM. PPS,因为很多人不理解我的问题非常好,我会尝试重新塑造它:做我的理解正确的,第一种方法是一种局部性地狱,当每一次x[m][n]被访问的第一x数组将被加载到CPU缓存中,然后x[m]数组将被加载,从而使每次访问都以与RAM对话的速度进行。 Or am I wrong and the first method is also OK from data-locality point of view? 还是我错了,从数据局部性的角度来看,第一种方法也可以吗?

For C-style allocations you can actually have the best of both worlds: 对于C风格的分配,您实际上可以兼得两者:

double **allocate_matrix(int NumRows, int NumCol)
{
  double **x;
  int i;

  x = (double **)malloc(NumRows * sizeof(double *));
  x[0] = (double *)calloc(NumRows * NumCol, sizeof(double)); // <<< single contiguous memory allocation for entire array
  for (i = 1; i < NumRows; ++i) x[i] = x[i - 1] + NumCols;
  return x;
}

This way you get data locality and its associated cache/memory access benefits, and you can treat the array as a double ** or a flattened 2D array ( array[i * NumCols + j] ) interchangeably. 这样,您可以获得数据局部性及其相关的缓存/内存访问优势,并且可以将数组视为double **或扁平2D数组( array[i * NumCols + j] )互换使用。 You also have fewer calloc / free calls ( 2 versus NumRows + 1 ). 你也有少calloc / free电话( 2NumRows + 1 )。

No need to guess whether the compiler will optimize the first method. 无需猜测编译器是否会优化第一种方法。 Just use the second method which you know is fast, and use a wrapper class that implements for example these methods: 只需使用您知道很快的第二种方法,并使用实现以下方法的包装器类即可:

double& operator(int x, int y);
double const& operator(int x, int y) const;

... and access your objects like this: ...并像这样访问您的对象:

arr(2, 3) = 5;

Alternatively, if you can bear a little more code complexity in the wrapper class(es), you can implement a class that can be accessed with the more traditional arr[2][3] = 5; 另外,如果您可以在包装类中承担更多的代码复杂性,则可以实现一个类,该类可以用更传统的arr[2][3] = 5; syntax. 句法。 This is implemented in a dimension-agnostic way in the Boost.MultiArray library, but you can do your own simple implementation too, using a proxy class. 这在Boost.MultiArray库中以与维度无关的方式实现,但是您也可以使用代理类来执行自己的简单实现。

Note: Considering your usage of C style (a hardcoded non-generic "double" type, plain pointers, function-beginning variable declarations, and malloc ), you will probably need to get more into C++ constructs before you can implement either of the options I mentioned. 注意:考虑到C风格的使用(硬编码的非通用“ double”类型,普通指针,函数开头的变量声明和malloc ),在实现任何一个选项之前,您可能需要更多地使用C ++构造。我提到。

The two methods are quite different. 两种方法完全不同。

  • While the first method allows for easier direct access to the values by adding another indirection (the double** array, hence you need 1+N mallocs), ... 尽管第一种方法通过添加另一个间接寻址( double**数组,因此您需要1 + N个malloc)可以更轻松地直接访问值,但是...
  • the second method guarantees that ALL values are stored contiguously and only requires one malloc. 第二种方法保证ALL值是连续存储的,只需要一个malloc。

I would argue that the second method is always superior. 我认为第二种方法总是更好的。 Malloc is an expensive operation and contiguous memory is a huge plus, depending on the application. 根据应用程序的不同,Malloc是一项昂贵的操作,而连续内存是一项巨大的优势。

In C++, you'd just implement it like this: 在C ++中,您可以像这样实现它:

std::vector<double> matrix(NumRows * NumCols);
matrix[y * numCols + x] = value;  // Access

and if you're concerned with the inconvenience of having to compute the index yourself, add a wrapper that implements operator(int x, int y) to it. 并且如果您担心必须自己计算索引的不便之处,请向其添加一个实现operator(int x, int y)的包装器。

You are also right that the first method is more expensive when accessing the values. 您也很正确,第一种方法在访问值时更昂贵。 Because you need two memory lookups as you described x[m] and then x[m][n] . 因为您需要按照x[m]x[m][n]顺序进行两次内存查找。 There is no way the compiler will "optimize this away". 编译器无法“优化”。 The first array, depending on its size, will be cached, and the performance hit may not be that bad. 根据其大小,将对第一个阵列进行缓存,并且性能影响可能不会那么糟。 In the second case, you need an extra multiplication for direct access. 在第二种情况下,您需要额外的乘法才能直接访问。

In the first method you use, the double* in the master array point to logical columns (arrays of size NumCol ). 在您使用的第一种方法中,主数组中的double*指向逻辑列(大小为NumCol数组)。

So, if you write something like below, you get the benefits of data locality in some sense (pseudocode): 因此,如果您编写类似下面的内容,则可以从某种意义上(伪代码)获得数据局部性的好处:

foreach(row in rows):
    foreach(elem in row):
        //Do something

If you tried the same thing with the second method, and if element access was done the way you specified (ie x[NumCol*m + n] ), you still get the same benefit. 如果您使用第二种方法尝试了相同的操作,并且以指定的方式(即x[NumCol*m + n] )完成了元素访问,则您仍会获得相同的收益。 This is because you treat the array to be in row-major order. 这是因为您将数组按行优先顺序进行处理。 If you tried the same pseudocode while accessing the elements in column-major order, I assume you'd get cache misses given that the array size is large enough. 如果您在按列优先顺序访问元素时尝试了相同的伪代码,则假定数组大小足够大,我认为您会遇到缓存未命中的情况。

In addition to this, the second method has the additional desirable property of being a single contiguous block of memory which further improves the performance even when you loop through multiple rows (unlike the first method). 除此之外,第二种方法还具有另一个令人希望的特性,即它是单个连续的内存块,即使您遍历多行,也可以进一步提高性能(与第一种方法不同)。

So, in conclusion, the second method should be much better in terms of performance. 因此,总而言之,第二种方法在性能方面应该更好。

If NumCol is a compile-time constant, or if you are using GCC with language extensions enabled, then you can do: 如果NumCol是编译时常量,或者您使用的是启用了语言扩展的GCC,则可以执行以下操作:

double (*x)[NumCol] = (double (*)[NumCol]) malloc(NumRows * sizeof (double[NumCol]));

and then use x as a 2D array and the compiler will do the indexing arithmetic for you. 然后将x用作2D数组,编译器将为您执行索引运算。 The caveat is that unless NumCol is a compile-time constant, ISO C++ won't let you do this, and if you use GCC language extensions you won't be able to port your code to another compiler. 需要注意的是,除非NumCol是编译时常量,否则ISO C ++不会允许您这样做,并且如果您使用GCC语言扩展,则无法将代码移植到另一个编译器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM