简体   繁体   English

二维数组是否实现为连续的一维数组?

[英]Is a two-dimensional array implemented as a continuous one-dimensional array?

I have a question about the memory layout of a two-dimensional array.我对二维数组的 memory 布局有疑问。 When we define one, just like int a[3][4] , is the memory allocated to this array continuous?当我们定义一个,就像int a[3][4] ,分配给这个数组的memory是连续的吗?

Or in other words, is a two-dimensional array implemented as a continuous one-dimensional array?或者换句话说,二维数组是否实现为连续的一维数组?

If the answer is yes, is accessing a[0][6] equivalent to accessing a[1][2] ?如果答案是肯定的,那么访问a[0][6]是否等同于访问a[1][2]

I wrote the following C program.我写了下面的 C 程序。

#include <stdio.h>
int main(){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    printf("%d %d\n", a[0][6], a[1][2]);
    return 0;
}

I find that the output is 7 7 .我发现 output 是7 7

a[0][6] seems illegal, but it points to a[1][2] , I want to know why and is such operation legal? a[0][6]看似不合法,但它指向a[1][2] ,我想知道为什么这样的操作合法吗?

This is as interesting case.这是一个有趣的案例。 Section 6.2.5p20 of the C standard defines an array as follows: C 标准的第 6.2.5p20 节定义了一个数组,如下所示:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type.数组类型描述了一个连续分配的非空对象集,具有特定成员 object 类型,称为元素类型。 The element type shall be complete whenever the array type is specified.每当指定数组类型时,元素类型都应完整。 Array types are characterized by their element type and by the number of elements in the array.数组类型的特征在于它们的元素类型和数组中的元素数量。 An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called ''array of T ''.数组类型被认为是从它的元素类型派生的,如果它的元素类型是 T,那么数组类型有时被称为“T 的数组”。 The construction of an array type from an element type is called ''array type derivation''从元素类型构造数组类型称为“数组类型推导”

So an array is a contiguous set of objects of a particular type.因此,数组是特定类型的一组连续对象。 In the case of int a[3][4] it is an array of size 3 whose objects are also arrays. The type of the subarray is int [4] , ie an array of size 4 of type int .int a[3][4]的情况下,它是一个大小为 3 的数组,其对象也是 arrays。子数组的类型是int [4] ,即大小为 4 的数组,类型为int

This means that a 2D array, or more accurately an array of arrays, does indeed have all of the individual members of the inner array laid out continuously.这意味着一个二维数组,或者更准确地说是一个 arrays 的数组,确实具有连续布置的内部数组的所有单个成员。

This does not however mean that the above array can be accessed as a[0][6] as in your example.但是,这并不意味着上面的数组可以像您的示例中那样作为a[0][6]访问。

Section 6.5.2.1p2 regarding the array subscript operator [] states:第 6.5.2.1p2 节关于数组下标运算符[]指出:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))下标运算符[]的定义是E1[E2]等同于(*((E1)+(E2)))

And section 6.5.6p8 regarding the + operator when applied to a pointer operand states: 6.5.6p8 节关于将+运算符应用于指针操作数时指出:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand.当具有 integer 类型的表达式与指针相加或相减时,结果具有指针操作数的类型。 If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.如果指针操作数指向数组 object 的一个元素,并且数组足够大,则结果指向与原始元素的元素偏移量,使得结果数组元素和原始数组元素的下标差等于 integer 表达式。 In other words, if the expression P points to the i -th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n) point to, respectively, the i+n -th and i−n -th elements of the array object, provided they exist.换句话说,如果表达式P指向数组 object 的第i个元素,则表达式(P)+N (等效于N+(P) )和(P)-N (其中N的值为 n)指向分别为数组 object 的第i+n个和第i−n个元素,前提是它们存在。 Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow;此外,如果表达式P指向数组 object 的最后一个元素,则表达式(P)+1指向数组 object 的最后一个元素,如果表达式Q指向数组 object 的最后一个元素,表达式(Q)-1指向数组 object 的最后一个元素。如果指针操作数和结果都指向同一个数组 object 的元素,或者指向数组 object 的最后一个元素,则评估不应产生溢出; otherwise, the behavior is undefined.否则,行为未定义。 If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.如果结果指向数组 object 的最后一个元素之后的一个,则不应将其用作计算的一元*运算符的操作数。

There's a lot to ingest here, but the important part is that given an array of size X, the valid array indices range from 0 to X-1, and attempting to use any other index triggers undefined behavior .这里有很多东西要摄取,但重要的部分是给定一个大小为 X 的数组,有效的数组索引范围从 0 到 X-1,并且尝试使用任何其他索引会触发未定义的行为 In particular, since a[0] has type int [4] , attempting to access a[0][6] is going outside the bounds of the array a[0] .特别是,由于a[0]的类型为int [4] ,因此尝试访问a[0][6]超出了数组a[0]的范围。

So while a[0][6] in practice would probably work, the C standard makes no guarantee that it will.因此,虽然a[0][6]在实践中可能会起作用,但 C 标准并不能保证它会起作用。 And given that modern optimizing compilers will aggressively assume undefined behavior doesn't exist in a program and optimize based on that fact, you could end up in a situation where something goes wrong and you don't know why.鉴于现代优化编译器会主动假定程序中不存在未定义的行为并根据该事实进行优化,您最终可能会遇到出错的情况,而您不知道为什么。

To summarize: yes 2D arrays are implemented that way, and no you can't access them like that.总结一下:是的 2D arrays 是这样实现的,不,你不能那样访问它们。

@dbush has written a good, correct answer explaining what's guaranteed and allowed. @dbush 写了一个很好的正确答案,解释了保证和允许的内容。 The TL;DR is basically: yes it is contiguous but still no guarantees are made that allow you to reliably access any (sub) array out of bounds. TL;DR 基本上是:是的,它是连续的,但仍然不能保证允许您可靠地越界访问任何(子)数组。 Any pointer to an item needs to point within a valid array in order to use pointer arithmetic or the [] operator on it.任何指向项目的指针都需要指向一个有效的数组,以便在其上使用指针算法或[]运算符。

This answer adds some possible work-arounds to solve the problem.这个答案添加了一些可能的解决方法来解决问题。

One work-around is to use a union and "type pun" between a 2D array and a 1D array:一种解决方法是在二维数组和一维数组之间使用union和“类型双关语”:

#include <stdio.h> 

typedef union
{
  int arr_2d [3][4];
  int arr_1d [3*4];
} arr_t;

int main (void)
{
  arr_t a = 
  { 
    .arr_2d =  
    {
      { 1, 2, 3, 4},
      { 5, 6, 7, 8},
      {9, 10, 11, 12}
    }
  };

  printf("%d %d\n", a.arr_1d[6], a.arr_1d[(1*4)+2]); // well-defined, union type punning
  return 0;
}

Another universal work-around is that it's fine to inspect any variable in C as a chunk of raw data by using character types, in which case you can regard the whole variable as an array of bytes:另一个通用的解决方法是,可以使用字符类型将 C 中的任何变量作为原始数据块进行检查,在这种情况下,您可以将整个变量视为字节数组:

#include <stdio.h>
int main (void){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    unsigned char* ptr = (unsigned char*)a;

    // well-defined: byte-wise pointer arithmetic on a character type
    // well-defined: dereferencing as int, since the effective type of the memory location is int:
    int x = *(int*)&ptr[sizeof(int)*6];
    int y = *(int*)&ptr[sizeof(int[4])*1 + sizeof(int)*2];

    printf("%d %d\n", x, y);
    return 0;
}

Or in case you are a fan of obscure macros, a rewrite of the previous example (not really recommended):或者如果你是晦涩宏的粉丝,重写前面的例子(不推荐):

#include <stdio.h>

#define GET_ITEM(arr, x, y) \ // see 1)
  _Generic( **(arr),        \
            int:  *(int*) &((unsigned char*)(arr))[sizeof(*arr)*(x) + sizeof(int)*(y)] ) 

int main (void){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    unsigned char* ptr = (unsigned char*)a;

    // well-defined:
    printf("%d %d\n", GET_ITEM(a,0,6), GET_ITEM(a,1,2));
    return 0;
}

1) Explanation: _Generic for type safety. 1) 说明:_Generic 类型安全。 Cast to a character type, do byte-wise pointer arithmetic based on the size of a 1D array of int for x and the size of an int for y.转换为字符类型,根据 x 的一维 int 数组的大小和 y 的 int 的大小进行字节指针运算。 Precedence goes as [] over &.优先级为 [] 高于 &。 The & is to get an address, then cast to an int* and dereference. & 是获取地址,然后转换为 int* 并取消引用。 The value will be returned by the macro.该值将由宏返回。

Back in the wild days, before compilers got serious about emitting diagnostics for doing (very likely) stupid things, this type of code was not uncommon.回到疯狂的日子,在编译器开始认真地为做(很可能)愚蠢的事情发出诊断之前,这种类型的代码并不少见。

NOTE: DO NOT DO THIS注意:不要这样做

#include <stdio.h>

int main(int argc, char* argv[]) {
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};

    // treat the array as a single, contiguous arrangement
    // will throw compiler warning/error
    int *p = &a; // <=== VERY BAD, invalid pointer assignment!!

    printf("%d %d\n", p[6], a[1][2]);
    return 0;
}

This demonstrates that the array is in fact laid out contiguously in row-major order.这表明数组实际上是按行优先顺序连续布置的。 This is one of those things about C that if you really want it to, it will let you shoot yourself right in the foot.这是关于 C 的事情之一,如果你真的想要它,它会让你搬起石头砸自己的脚。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM