简体   繁体   English

这段代码如何在不使用sizeof()的情况下确定数组大小?

[英]How does this piece of code determine array size without using sizeof( )?

Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. 通过一些C面试问题,我发现了一个问题,说明“如何在不使用sizeof运算符的情况下在C中查找数组的大小?”,并提供以下解决方案。 It works, but I cannot understand why. 它有效,但我无法理解为什么。

#include <stdio.h>

int main() {
    int a[] = {100, 200, 300, 400, 500};
    int size = 0;

    size = *(&a + 1) - a;
    printf("%d\n", size);

    return 0;
}

As expected, it returns 5. 正如预期的那样,它返回5。

edit: people pointed out this answer, but the syntax does differ a bit, ie the indexing method 编辑:人们指出了这个答案,但语法确实有所不同,即索引方法

size = (&arr)[1] - arr;

so I believe both questions are valid and have a slightly different approach to the problem. 所以我相信这两个问题都是有效的,并且对问题的解决方法略有不同。 Thank you all for the immense help and thorough explanation! 谢谢大家的大力帮助和彻底的解释!

When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (ie, an array). 向指针添加1时,结果是指向类型的对象序列(即数组)中下一个对象的位置。 If p points to an int object, then p + 1 will point to the next int in a sequence. 如果p指向int对象,则p + 1将指向序列中的下一个int If p points to a 5-element array of int (in this case, the expression &a ), then p + 1 will point to the next 5-element array of int in a sequence. 如果p指向int的5元素数组(在本例中为表达式&a ),那么p + 1将指向序列int的下一个5元素数组

Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers. 减去两个指针(前提是它们都指向同一个数组对象,或者一个指向一个超过数组的最后一个元素)会产生这两个指针之间的对象数(数组元素)。

The expression &a yields the address of a , and has the type int (*)[5] (pointer to 5-element array of int ). 表达&a产生的地址a ,并且具有类型int (*)[5]指针至5个元素的数组的int )。 The expression &a + 1 yields the address of the next 5-element array of int following a , and also has the type int (*)[5] . 表达&a + 1产生的下一5个元素的数组的地址int以下a ,并且还具有的类型int (*)[5] The expression *(&a + 1) dereferences the result of &a + 1 , such that it yields the address of the first int following the last element of a , and has type int [5] , which in this context "decays" to an expression of type int * . 表达式*(&a + 1)解引用的结果&a + 1 ,使得其产生所述第一地址int的最后一个元素以下a ,并且具有类型int [5]在此上下文中,“衰减”到int *表达式。

Similarly, the expression a "decays" to a pointer to the first element of the array and has type int * . 类似地,表达a “衰减”的指针数组的第一元素和具有类型int *

A picture may help: 图片可能会有所帮助:

int [5]  int (*)[5]     int      int *

+---+                   +---+
|   | <- &a             |   | <- a
| - |                   +---+
|   |                   |   | <- a + 1
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
+---+                   +---+
|   | <- &a + 1         |   | <- *(&a + 1)
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
+---+                   +---+

This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int , while on the right, we're viewing it as a sequence of int . 这是同一存储的两个视图 - 在左侧,我们将其视为一个由5个元素组成的int序列,而在右侧,我们将其视为一个int序列。 I also show the various expressions and their types. 我还展示了各种表达方式及其类型。

Be aware, the expression *(&a + 1) results in undefined behavior : 请注意,表达式*(&a + 1)导致未定义的行为

... ...
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated. 如果结果指向数组对象的最后一个元素之后,则不应将其用作已计算的一元*运算符的操作数。

C 2011 Online Draft , 6.5.6/9 C 2011在线草案 ,6.5.6 / 9

This line is of most importance: 这条线最重要:

size = *(&a + 1) - a;

As you can see, it first takes the address of a and adds one to it. 正如你所看到的,它首先采取的地址a ,并增加了一个它。 Then, it dereferences that pointer and subtracts the original value of a from it. 然后,取消引用该指针和减去的原始值a从它。

Pointer arithmetic in C causes this to return the number of elements in the array, or 5 . C中的指针运算会导致返回数组中元素的数量,或者5 Adding one and &a is a pointer to the next array of 5 int s after a . 加入一种和&a是指向的5下一个阵列int年代后a After that, this code dereferences the resulting pointer and subtracts a (an array type that has decayed to a pointer) from that, giving the number of elements in the array. 之后,此代码取消引用结果指针并从中减去a (已衰减为指针的数组类型),从而给出数组中的元素数。

Details on how pointer arithmetic works: 指针算术如何工作的详细信息:

Say you have a pointer xyz that points to an int type and contains the value (int *)160 . 假设你有一个指向xyz的指针,它指向一个int类型并包含值(int *)160 When you subtract any number from xyz , C specifies that the actual amount subtracted from xyz is that number times the size of the type that it points to. 当你从xyz减去任何数字时,C指定从xyz减去的实际数量是该数字乘以它指向的类型的数量。 For example, if you subtracted 5 from xyz , the value of xyz resulting would be xyz - (sizeof(*xyz) * 5) if pointer arithmetic didn't apply. 例如,如果从xyz减去5 ,则如果指针算术不适用,则xyz结果的值将为xyz - (sizeof(*xyz) * 5)

As a is an array of 5 int types, the resulting value will be 5. However, this will not work with a pointer, only with an array. 作为a5 int类型的数组,结果值将为5.但是,这不适用于指针,仅适用于数组。 If you try this with a pointer, the result will always be 1 . 如果使用指针尝试此操作,结果将始终为1

Here's a little example that shows the addresses and how this is undefined. 这是一个显示地址以及未定义的示例。 The the left-hand side shows the addresses: 左侧显示地址:

a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced

This means that the code is subtracting a from &a[5] (or a+5 ), giving 5 . 这意味着代码从&a[5] (或a+5 )中减去a ,给出5

Note that this is undefined behavior, and should not be used under any circumstances. 请注意,这是未定义的行为,不应在任何情况下使用。 Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs. 不要指望这种行为在所有平台上都是一致的,并且不要在生产程序中使用它。

Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though. 嗯,我怀疑这是在C早期就不会有效的事情。但这很聪明。

Taking the steps one at a time: 一次一个步骤:

  • &a gets a pointer to an object of type int[5] &a获取指向int [5]类型的对象的指针
  • +1 gets the next such object assuming there is an array of those 假设有一个数组, +1会得到下一个这样的对象
  • * effectively converts that address into type pointer to int *有效地将该地址转换为int的类型指针
  • -a subtracts the two int pointers, returning the count of int instances between them. -a减去两个int指针,返回它们之间的int实例计数。

I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. 我不确定它是完全合法的(在这里我的意思是语言律师合法 - 不会在实践中起作用),因为某些类型的操作正在进行中。 For example you are only "allowed" to subtract two pointers when they point to elements in the same array. 例如,当你指向同一个数组中的元素时,你只能“允许”减去两个指针。 *(&a+1) was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a . *(&a+1)通过访问另一阵列,虽然是一个父阵列合成,所以是不实际的指针到同一阵列a Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing ( * ) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case! 此外,虽然允许您通过数组的最后一个元素合成指针,并且可以将任何对象视为1个元素的数组,但是对于此合成指针,取消引用( * )的操作不是“允许的”,即使在这种情况下它没有行为!

I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1) might only return the address of the next pointer of type int**. 我怀疑在C的早期(K&R语法,任何人?),一个数组更快地衰减成指针,所以*(&a+1)可能只返回int **类型的下一个指针的地址。 The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. 现代C ++中更严格的定义肯定允许指向数组类型的指针存在并知道数组大小,并且可能C标准也遵循了。 All C function code only takes pointers as arguments, so the technical visible difference is minimal. 所有C函数代码仅将指针作为参数,因此技术上可见的差异很小。 But I am only guessing here. 但我只是在这里猜测。

This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. 这种详细的合法性问题通常适用于C解释器或lint类型工具,而不是编译代码。 An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer. 解释器可能会将2D数组实现为指向数组的指针数组,因为实现的运行时功能较少,在这种情况下解除引用+1将是致命的,即使它工作也会给出错误的答案。

Another possible weakness may be that the C compiler might align the outer array. 另一个可能的弱点可能是C编译器可能对齐外部数组。 Imagine if this was an array of 5 chars ( char arr[5] ), when the program performs &a+1 it is invoking "array of array" behaviour. 想象一下,如果这是一个包含5个字符的数组( char arr[5] ),当程序执行&a+1它会调用“数组数组”行为。 The compiler might decide that an array of array of 5 chars ( char arr[][5] ) is actually generated as an array of array of 8 chars ( char arr[][8] ), so that the outer array aligns nicely. 编译器可能会认为5个字符数组的数组( char arr[][5] )实际上是作为8个字符数组( char arr[][8] )的数组生成的,因此外部数组很好地对齐。 The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might. 我们正在讨论的代码现在将数组大小报告为8,而不是5.我不是说特定的编译器肯定会这样做,但它可能会。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM