简体繁体 English

C99 嵌套数组未定义行为

[英]C99 nested arrays undefined behaviour

原文 2020-10-21 09:23:06 7 1 c/ multidimensional-array/ c99/ c-standard-library

In our lecture we have recently taken a look at the c99 standard on pointer equality(6.5.9.6) and applied it to nested arrays.在我们的讲座中，我们最近看了关于指针相等性的 c99 标准 (6.5.9.6) 并将其应用于嵌套数组。 There it states that pointers are only guaranteed to be equal if "one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space".它指出，只有在“一个是指向一个数组对象末尾的指针，另一个是指向另一个数组对象的开头的指针，该对象恰好紧跟在第一个数组对象之后，才保证指针相等”。地址空间”。

The professor then explained this is the reason that the array access a[0][19] is technically undefined for a nested array with dimensions 4*5.教授随后解释说，这就是数组访问 a[0][19] 在技术上对于维度为 4*5 的嵌套数组未定义的原因。 Is this true?这是真的？ If so, why are negative indices defined then eg a[1][-1]?如果是这样，为什么要定义负索引，例如 a[1][-1]？

1 个解决方案

Neither a[0][19] nor a[1][-1] has behavior defined by the C standard. a[0][19]和a[1][-1]都没有 C 标准定义的行为。

C 2018 6.5.2/1 2 tells us that array subscripting is defined in terms of pointer arithmetic: C 2018 6.5.2/1 2 告诉我们数组下标是根据指针算法定义的：

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object.后缀表达式后跟方括号[]的表达式是数组对象元素的下标指定。 The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))) …下标运算符[]的定义是E1[E2]等同于(*((E1)+(E2))) ...

Thus a[0][19] is identical to *(a[0] + 19) (where some parentheses have been omitted because they are unnecessary), and a[1][-1] is identical to *(a[1] + -1) .因此a[0][19]与*(a[0] + 19) （其中一些括号因不必要而被省略），而a[1][-1]与*(a[1] + -1) 。

In a[0] + 19 , and a[1] + -1 , a[0] and a[1] are arrays.在a[0] + 19和a[1] + -1 ， a[0]和a[1]是数组。 In these expressions, they are automatically converted to pointers to their first elements, per C 2018 6.3.2.1 3. So these expressions are equivalent to p + 19 and q + -1 , where p and q are the addresses of those first elements, &a[0][0] and a[1][0] , respectively.在这些表达式中，根据 C 2018 6.3.2.1 3，它们会自动转换为指向其第一个元素的指针。所以这些表达式等价于p + 19和q + -1 ，其中p和q是那些第一个元素的地址， &a[0][0]和a[1][0]分别。

C 2018 6.5.6 8 defines pointer arithmetic: C 2018 6.5.6 8 定义了指针算法：

If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.如果指针操作数指向数组对象的元素，并且数组足够大，则结果指向与原始元素的元素偏移量，使得结果和原始数组元素的下标之差等于整数表达式。 In other words, if the expression P points to the i -th element of an array object, the expressions (P)+N (equivalently, N+(P) ) and (P)-N (where N has the value n ) point to, respectively, the i + n -th and i − n -th elements of the array object, provided they exist.换句话说，如果表达式P指向数组对象的第i个元素，则表达式(P)+N （相当于N+(P) ）和(P)-N （其中N的值为n ）指向分别指向数组对象的第i + n个和第i − n个元素，前提是它们存在。 Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.此外，如果表达式P指向数组对象的最后一个元素，则表达式(P)+1指向数组对象最后一个元素的后面，如果表达式Q指向数组对象的最后一个元素的后面，表达式(Q)-1指向数组对象的最后一个元素。 If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow;如果指针操作数和结果都指向同一个数组对象的元素，或者数组对象的最后一个元素之后，求值不会产生溢出； otherwise, the behavior is undefined.否则，行为未定义。

So p + 19 would point to element 19 of a[0] if it existed.所以p + 19将指向a[0]元素 19，如果它存在的话。 But a[0] is an array of 5 elements, so element 19 does not exist, and therefore the behavior of p + 19 is not defined by the standard.但是a[0]是一个 5 个元素的数组，所以第 19 个元素不存在，因此p + 19的行为没有被标准定义。

Similarly, q + -1 would point to element -1 of a[1] , but element -1 does not exist, and therefore the behavior of q + -1 is not defined by the standard.类似地， q + -1将指向a[1]元素 -1，但元素 -1 不存在，因此标准没有定义q + -1的行为。

The fact that these arrays are contained within a larger array, and that we know the memory layout of all elements in this larger array, does not matter.这些数组包含在一个更大的数组中，并且我们知道这个更大数组中所有元素的内存布局这一事实并不重要。 The C standard does not define the behavior in terms of the larger memory layout; C 标准没有定义更大内存布局方面的行为； it specifies behavior based on the specific array in which pointer arithmetic is being evaluated.它根据正在评估指针算术的特定数组指定行为。 AC implementation would be free to make this arithmetic work like simple address arithmetic and to define the behavior if it desired, but it also permitted not to do this. AC 实现可以自由地使这种算术像简单的地址算术一样工作，并根据需要定义行为，但也允许不这样做。 Compiler optimization has become more sophisticated and aggressive over the years, and it may transform these expressions based on the C standard's rules about specific array arithmetic without regard to the memory layout, and this can cause the expressions to fail (not behave as they would with simple address arithmetic).多年来，编译器优化变得更加复杂和激进，它可能会根据 C 标准关于特定数组算术的规则来转换这些表达式，而不考虑内存布局，这可能导致表达式失败（不像使用简单的地址算术）。