简体   繁体   English

GCC与Clang复制结构灵活数组成员

[英]GCC vs Clang copying struct flexible array member

Consider the following code snippet. 考虑以下代码片段。

#include <stdio.h>

typedef struct s {
    int _;
    char str[];
} s;
s first = { 0, "abcd" };

int main(int argc, const char **argv) {
    s second = first;
    printf("%s\n%s\n", first.str, second.str);
}

When I compile this with GCC 7.2, I get: 当我用GCC 7.2编译时,我得到:

$ gcc-7 -o tmp tmp.c && ./tmp
abcd
abcd

But when I compile this with Clang (Apple LLVM version 8.0.0 (clang-800.0.42.1)), I get the following: 但是,当我使用Clang(Apple LLVM版本8.0.0(clang-800.0.42.1))进行编译时,得到以下信息:

$ clang -o tmp tmp.c && ./tmp
abcd
# Nothing here

Why does the output differ between the compilers? 为什么编译器之间的输出不同? I would expect the string not to be copied, as it's a flexible array member (similar to this question ). 我希望该字符串不会被复制,因为它是一个灵活的数组成员(类似于此问题 )。 Why does GCC actually copy it? 为什么GCC实际上会复制它?

Edit 编辑

Some comments and an answer suggested this might be due to optimization. 一些评论和答案表明这可能是由于优化。 GCC may make second an alias of first , so updating second should disallow GCC from doing that optimization. GCC可以将second作为first的别名,因此更新second应该禁止GCC进行该优化。 I added the line: 我添加了一行:

second._ = 1;

But this doesn't change the output. 但这不会改变输出。

Here's the real answer of what's going on with gcc. 这是gcc发生了什么的真正答案。 second is allocated on the stack, just as you'd expect. 正如您所期望的那样, second分配在堆栈上。 It is not an alias for first . 它不是first的别名。 This is easily verified by printing their addresses. 通过打印其地址可以很容易地验证这一点。

Additionally, the declaration s second = first; 另外,声明s second = first; is corrupting the stack, because (a) gcc is allocating the minimum amount of storage for second but (b) it is copying all of first into second, corrupting the stack. 之所以会破坏堆栈,是因为(a)gcc正在为second磁盘分配最小的存储量,但是(b)它将first磁盘全部复制到第二个磁盘,从而破坏了堆栈。

Here is a modified version of the original code which shows this: 这是原始代码的修改版本,显示了此内容:

#include <stdio.h>

typedef struct s {
    int _;
    char str[];
} s;
s first = { 0, "abcdefgh" };
int main(int argc, const char **argv) {
    char v[] = "xxxxxxxx";
    s second = first;
    printf("%p %p %p\n", (void *) v, (void *) &first, (void *) &second);
    printf("<%s> <%s> <%s>\n", v, first.str, second.str);
}

On my 32-bit Linux machine, with gcc, I get the following output: 在具有gcc的32位Linux计算机上,得到以下输出:

0xbf89a303 0x804a020 0xbf89a2fc
<defgh> <abcdefgh> <abcdefgh>

As you can see from the addresses, v and second are on the stack, and first is in the data section. 从地址中可以看到, vsecond在堆栈上, first在数据部分中。 Further, it is also clear that the initialization of second has overwritten v on the stack, with the result that instead of the expected <xxxxxxxx> , it is instead showing <defgh> . 此外,还很明显, second的初始化已覆盖了堆栈上的v ,结果是代替了预期的<xxxxxxxx> ,而显示了<defgh>

This seems like a gcc bug to me. 在我看来,这似乎是一个gcc错误。 At the very least, it should warn that the initialization of second will corrupt the stack, since it clearly has enough information to know this at compile time. 至少应该警告, second的初始化会破坏堆栈,因为它显然有足够的信息在编译时就知道这一点。

Edit: I tested this some more, and obtained essentially equivalent results by splitting the declaration of second into: 编辑:我对此进行了更多测试,并通过将second的声明拆分为以下内容而获得了基本等效的结果:

s second;
second = first;

The real problem is the assignment. 真正的问题是分配。 It's copying all of first , rather than the minimal common part of the structure type, which is what I believe it should do. 它复制的是first全部 ,而不是结构类型的最小公用部分,这是我认为应该做的。 In fact, if you move the static initialization of first into a separate file, the assignment does what it should do, v prints correctly, and second.str is undefined garbage. 实际上,如果将first的静态初始化移动到一个单独的文件中,则分配将执行应做的工作, v正确打印, second.str是未定义的垃圾。 This is the behavior gcc should be producing, regardless of whether the initialization of first is visible in the same compilation unit or not. 这是gcc应该产生的行为,无论first的初始化在同一编译单元中是否可见。

So, for an answer, both compilers are behaving correctly, but the answers you are getting are undefined behavior. 因此,对于一个答案,两个编译器的行为均正确,但是您得到的答案是未定义的行为。

GCC 海湾合作委员会
Because you never modify second GCC is simply making second and alias of first in its lookup table. 因为您从不修改second GCC,所以只是在其查找表中将second和别名作为first Modify second and GCC cannot make that optimization and you'll get the same answer/crash as Clang. 修改秒,GCC无法进行优化,您将获得与Clang相同的答案/崩溃。

Clang
Clang does not automatically apply the same optimization, it seems. 似乎Clang不会自动应用相同的优化。 So when it copies the structure, it does so correctly: It copies the single int and nothing else. 因此,当它复制结构时,它可以正确地执行此操作:它复制单个int ,而不复制其他任何内容。

You were lucky that there was a zero value on the stack after your local second variable, terminating your unknown character string. 您很幸运在本地second变量之后的堆栈上有一个零值,终止了未知的字符串。 Basically, you are using an uninitialized pointer. 基本上,您使用的是未初始化的指针。 Were there no zero, you could have gotten a lot of garbage and a memory fault. 如果没有零,那么您可能会得到很多垃圾和内存错误。

The purpose of this thing is to do low-level stuff , like implement a memory manager, etc, by casting some memory to your structure. 这件事的目的是通过将一些内存投射到您的结构上来做一些底层的工作 ,例如实现内存管理器等。 The compiler is under no obligation to understand what you are doing; 编译器没有义务了解您在做什么。 it is only under obligation to act as if you know what you are doing. 只有当您知道自己在做什么时才有义务采取行动。 If you fail to cast the structure type over memory that actually has data of that type in it, all bets are off. 如果您无法将结构类型强制转换到实际具有该类型数据的内存上,则所有选择均无效。

edit 编辑
So, using godbolt.org and looking at the assembly: 因此,使用godbolt.org并查看程序集:

.LC0:
        .string "%s\n%s\n"
main:
        sub     rsp, 24
        mov     eax, DWORD PTR first[rip]
        mov     esi, OFFSET FLAT:first+4
        lea     rdx, [rsp+16]
        mov     edi, OFFSET FLAT:.LC0
        mov     DWORD PTR [rsp+12], eax
        xor     eax, eax
        call    printf
        xor     eax, eax
        add     rsp, 24
        ret
first:
        .long   0
        .string "abcd"

We see that GCC is, actually, doing exactly what I said with the OP's original code: treating second as an alias of first . 我们看到,实际上,GCC确实按照我对OP的原始代码所说的去做:将second作为first的别名。

Tom Karzes has significantly modified the code, and so is experiencing a different issue. 汤姆·卡兹(Tom Karzes)已对代码进行了重大修改,因此遇到了另一个问题。 What he reports does appear to be a bug; 他报告的内容似乎是一个错误; I haven't time ATM to figure out what is really happening with his stack-corrupting assignment. 我还没来得及让ATM弄清楚他的堆栈损坏任务到底发生了什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM