简体   繁体   English

C中struct中具有可变长度数组的奇怪行为

[英]Weird behaviour with variable length arrays in struct in C

I came across a concept which some people call a "Struct Hack" where we can declare a pointer variable inside a struct, like this: 我遇到了一个被某些人称为“ Struct Hack”的概念,我们可以在结构内部声明一个指针变量,如下所示:

struct myStruct{
    int data;
    int *array;
};

and later on when we allocate memory for a struct myStruct using malloc in our main() function, we can simultaneously allocate memory for our int *array pointer in same step, like this: 之后,当我们在main()函数中使用mallocstruct myStruct分配内存时,我们可以在同一步骤中同时为int *array指针分配内存,如下所示:

struct myStruct *p = malloc(sizeof(struct myStruct) + 100 * sizeof(int));

p->array = p+1;

instead of 代替

struct myStruct *p = malloc(sizeof(struct myStruct));

p->array = malloc(100 * sizeof(int));

assuming we want an array of size 100. 假设我们想要一个大小为100的数组。

The first option is said to be better since we would get a continuous chunk of memory and we can free that whole chunk with one call to free() versus 2 calls in the latter case. 据说第一种方法更好,因为我们可以获得连续的内存块,并且可以通过一次调用free()释放整个块,而在第二种情况下可以进行两次调用。

Experimenting, I wrote this: 做实验时,我这样写:

#include<stdio.h>
#include<stdlib.h>

struct myStruct{
    int i;
    int *array;
};

int main(){
    /* I ask for only 40 more bytes (10 * sizeof(int)) */

    struct myStruct *p = malloc(sizeof(struct myStruct) + 10 * sizeof(int)); 

    p->array = p+1; 

    /* I assign values way beyond the initial allocation*/
    for (int i = 0; i < 804; i++){
        p->array[i] = i;
    }

    /* printing*/
    for (int i = 0; i < 804; i++){
        printf("%d\n",p->array[i]);
    }

    return 0;
}

I am able to execute it without problems, without any segmentation faults. 我能够执行它而没有任何问题,没有任何分割错误。 Looks weird to me. 对我来说看起来很奇怪。

I also came to know that C99 has a provision which says that instead of declaring an int *array inside a struct, we can do int array[] and I did this, using malloc() only for the struct, like 我也知道C99有一条规定说,我们可以执行int array[]而不是在结构内部声明int *array ,而我只对结构使用malloc()

struct myStruct *p = malloc(sizeof(struct myStruct));

and initialising array[] like this 并像这样初始化array []

p->array[10] = 0; /* I hope this sets the array size to 10 
                    and also initialises array entries to 0 */

But then again this weirdness where I am able to access and assign array indices beyond the array size and also print the entries: 但是再一次出现这种怪异之处,我可以访问和分配超出数组大小的数组索引,还可以打印条目:

for(int i = 0; i < 296; i++){ // first loop
    p->array[i] = i;
}

for(int i = 0; i < 296; i++){ // second loop
    printf("%d\n",p->array[i]);
}

After printing p->array[i] till i = 296 it gives me a segmentation fault, but clearly it had no problems assigning beyond i = 9 . 在打印p->array[i]直到i = 296它给了我一个分割错误,但是显然,在i = 9之外赋值没有问题。 (If I increment 'i' till 300 in the first for loop above, I immediately get a segmentation fault and the program doesn't print any values.) (如果我在上面的第一个for循环中将“ i”增加到300,则我会立即遇到分段错误,并且程序不会显示任何值。)

Any clues about what's happening? 关于正在发生什么的任何线索? Is it undefined behaviour or what? 是未定义的行为还是什么?

EDIT: When I compiled the first snippet with the command 编辑:当我用命令编译第一个代码片段时

cc -Wall -g -std=c11 -O    struct3.c   -o struct3

I got this warning: 我收到此警告:

 warning: incompatible pointer types assigning to 'int *' from
  'struct str *' [-Wincompatible-pointer-types]
    p->array = p+1;

Yes, what you see here is an example of undefined behavior. 是的,您在此处看到的是一个未定义行为的示例。

Writing beyond the end of allocated array (aka buffer overflow) is a good example of undefined behavior: it will often appear to "work normally", while other times it will crash (eg "Segmentation fault"). 写超出已分配数组的末尾(aka缓冲区溢出)是未定义行为的一个很好的例子:它通常看起来像是“正常工作”,而其他时候它会崩溃(例如,“分段错误”)。

A low-level explanation: there are control structures in memory that are situated some distance from your allocated objects. 低级解释:内存中有一些控制结构,这些结构与您分配的对象相距一定距离。 If your program does a big buffer overflow, there is more chance it will damage these control structures, while for more modest overflows it will damage some unused data (eg padding). 如果您的程序发生较大的缓冲区溢出,则很有可能损坏这些控制结构,而对于较小的溢出,则将损坏一些未使用的数据(例如,填充)。 In any case, however, buffer overflows invoke undefined behavior. 但是,无论如何,缓冲区溢出会调用未定义的行为。

The "struct hack" in your first form also invokes undefined behavior (as indicated by the warning), but of a special kind - it's almost guaranteed that it would always work normally, in most compilers. 第一种形式的“结构hack”也会调用未定义的行为(如警告所示),但是它是一种特殊的行为-几乎可以保证,在大多数编译器中,它始终可以正常工作。 However, it's still undefined behavior, so not recommended to use. 但是,它仍然是未定义的行为,因此不建议使用。 In order to sanction its use, the C committee invented this "flexible array member" syntax (your second syntax), which is guaranteed to work. 为了批准其使用,C委员会发明了这种“灵活数组成员”语法(您的第二种语法),该语法可以保证正常工作。

Just to make it clear - assignment to an element of an array never allocates space for that element (not in C, at least). 为了明确起见-分配给数组的元素永远不会为该元素分配空间(至少不在C中)。 In C, when assigning to an element, it should already be allocated, even if the array is "flexible". 在C语言中,分配给元素时,即使数组是“ flexible”的,也应该已经分配了它。 Your code should know how much to allocate when it allocates memory. 您的代码应该知道分配内存时要分配多少。 If you don't know how much to allocate, use one of the following techniques: 如果您不知道要分配多少,请使用以下技术之一:

  • Allocate an upper bound: struct myStruct{ int data; int array[100]; // you will never need more than 100 numbers }; 分配一个上限: struct myStruct{ int data; int array[100]; // you will never need more than 100 numbers }; struct myStruct{ int data; int array[100]; // you will never need more than 100 numbers };
  • Use realloc 使用重新realloc
  • Use a linked list (or any other sophisticated data structure) 使用链表(或任何其他复杂的数据结构)

What you describe as a "Struct Hack" is indeed a hack. 您所说的“结构黑客”确实是一种黑客。 It is not worth IMO. 这是不值得的IMO。

p->array = p+1;

will give you problems on many compilers which will demand explicit conversion: 在许多需要显式转换的编译器上会给您带来问题:

p->array = (int *) (p+1);

I am able to execute it without problems, without any segmentation faults. 我能够执行它而没有任何问题,没有任何分割错误。 Looks weird to me. 对我来说看起来很奇怪。

It is undefined behaviour. 这是不确定的行为。 You are accessing memory on the heap and many compilers and operating system will not prevent you to do so. 您正在访问堆上的内存,许多编译器和操作系统都不会阻止您这样做。 But it extremely bad practice to use it. 但是使用它是非常糟糕的做法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM