[英]GCC vs Clang copying struct flexible array member
Consider the following code snippet. 考虑以下代码片段。
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcd" };
int main(int argc, const char **argv) {
s second = first;
printf("%s\n%s\n", first.str, second.str);
}
When I compile this with GCC 7.2, I get: 当我用GCC 7.2编译时,我得到:
$ gcc-7 -o tmp tmp.c && ./tmp
abcd
abcd
But when I compile this with Clang (Apple LLVM version 8.0.0 (clang-800.0.42.1)), I get the following: 但是,当我使用Clang(Apple LLVM版本8.0.0(clang-800.0.42.1))进行编译时,得到以下信息:
$ clang -o tmp tmp.c && ./tmp
abcd
# Nothing here
Why does the output differ between the compilers? 为什么编译器之间的输出不同? I would expect the string not to be copied, as it's a flexible array member (similar to this question ).
我希望该字符串不会被复制,因为它是一个灵活的数组成员(类似于此问题 )。 Why does GCC actually copy it?
为什么GCC实际上会复制它?
Edit 编辑
Some comments and an answer suggested this might be due to optimization. 一些评论和答案表明这可能是由于优化。 GCC may make
second
an alias of first
, so updating second
should disallow GCC from doing that optimization. GCC可以将
second
作为first
的别名,因此更新second
应该禁止GCC进行该优化。 I added the line: 我添加了一行:
second._ = 1;
But this doesn't change the output. 但这不会改变输出。
Here's the real answer of what's going on with gcc. 这是gcc发生了什么的真正答案。
second
is allocated on the stack, just as you'd expect. 正如您所期望的那样,
second
分配在堆栈上。 It is not an alias for first
. 它不是
first
的别名。 This is easily verified by printing their addresses. 通过打印其地址可以很容易地验证这一点。
Additionally, the declaration s second = first;
另外,声明
s second = first;
is corrupting the stack, because (a) gcc is allocating the minimum amount of storage for second
but (b) it is copying all of first
into second, corrupting the stack. 之所以会破坏堆栈,是因为(a)gcc正在为
second
磁盘分配最小的存储量,但是(b)它将first
磁盘全部复制到第二个磁盘,从而破坏了堆栈。
Here is a modified version of the original code which shows this: 这是原始代码的修改版本,显示了此内容:
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcdefgh" };
int main(int argc, const char **argv) {
char v[] = "xxxxxxxx";
s second = first;
printf("%p %p %p\n", (void *) v, (void *) &first, (void *) &second);
printf("<%s> <%s> <%s>\n", v, first.str, second.str);
}
On my 32-bit Linux machine, with gcc, I get the following output: 在具有gcc的32位Linux计算机上,得到以下输出:
0xbf89a303 0x804a020 0xbf89a2fc
<defgh> <abcdefgh> <abcdefgh>
As you can see from the addresses, v
and second
are on the stack, and first
is in the data section. 从地址中可以看到,
v
和second
在堆栈上, first
在数据部分中。 Further, it is also clear that the initialization of second
has overwritten v
on the stack, with the result that instead of the expected <xxxxxxxx>
, it is instead showing <defgh>
. 此外,还很明显,
second
的初始化已覆盖了堆栈上的v
,结果是代替了预期的<xxxxxxxx>
,而显示了<defgh>
。
This seems like a gcc bug to me. 在我看来,这似乎是一个gcc错误。 At the very least, it should warn that the initialization of
second
will corrupt the stack, since it clearly has enough information to know this at compile time. 至少应该警告,
second
的初始化会破坏堆栈,因为它显然有足够的信息在编译时就知道这一点。
Edit: I tested this some more, and obtained essentially equivalent results by splitting the declaration of second
into: 编辑:我对此进行了更多测试,并通过将
second
的声明拆分为以下内容而获得了基本等效的结果:
s second;
second = first;
The real problem is the assignment. 真正的问题是分配。 It's copying all of
first
, rather than the minimal common part of the structure type, which is what I believe it should do. 它复制的是
first
的全部 ,而不是结构类型的最小公用部分,这是我认为应该做的。 In fact, if you move the static initialization of first
into a separate file, the assignment does what it should do, v
prints correctly, and second.str
is undefined garbage. 实际上,如果将
first
的静态初始化移动到一个单独的文件中,则分配将执行应做的工作, v
正确打印, second.str
是未定义的垃圾。 This is the behavior gcc should be producing, regardless of whether the initialization of first
is visible in the same compilation unit or not. 这是gcc应该产生的行为,无论
first
的初始化在同一编译单元中是否可见。
So, for an answer, both compilers are behaving correctly, but the answers you are getting are undefined behavior. 因此,对于一个答案,两个编译器的行为均正确,但是您得到的答案是未定义的行为。
GCC 海湾合作委员会
Because you never modify second
GCC is simply making second
and alias of first
in its lookup table. 因为您从不修改
second
GCC,所以只是在其查找表中将second
和别名作为first
。 Modify second and GCC cannot make that optimization and you'll get the same answer/crash as Clang. 修改秒,GCC无法进行优化,您将获得与Clang相同的答案/崩溃。
Clang 铛
Clang does not automatically apply the same optimization, it seems. 似乎Clang不会自动应用相同的优化。 So when it copies the structure, it does so correctly: It copies the single
int
and nothing else. 因此,当它复制结构时,它可以正确地执行此操作:它复制单个
int
,而不复制其他任何内容。
You were lucky that there was a zero value on the stack after your local second
variable, terminating your unknown character string. 您很幸运在本地
second
变量之后的堆栈上有一个零值,终止了未知的字符串。 Basically, you are using an uninitialized pointer. 基本上,您使用的是未初始化的指针。 Were there no zero, you could have gotten a lot of garbage and a memory fault.
如果没有零,那么您可能会得到很多垃圾和内存错误。
The purpose of this thing is to do low-level stuff , like implement a memory manager, etc, by casting some memory to your structure. 这件事的目的是通过将一些内存投射到您的结构上来做一些底层的工作 ,例如实现内存管理器等。 The compiler is under no obligation to understand what you are doing;
编译器没有义务了解您在做什么。 it is only under obligation to act as if you know what you are doing.
只有当您知道自己在做什么时才有义务采取行动。 If you fail to cast the structure type over memory that actually has data of that type in it, all bets are off.
如果您无法将结构类型强制转换到实际具有该类型数据的内存上,则所有选择均无效。
edit 编辑
So, using godbolt.org and looking at the assembly: 因此,使用godbolt.org并查看程序集:
.LC0:
.string "%s\n%s\n"
main:
sub rsp, 24
mov eax, DWORD PTR first[rip]
mov esi, OFFSET FLAT:first+4
lea rdx, [rsp+16]
mov edi, OFFSET FLAT:.LC0
mov DWORD PTR [rsp+12], eax
xor eax, eax
call printf
xor eax, eax
add rsp, 24
ret
first:
.long 0
.string "abcd"
We see that GCC is, actually, doing exactly what I said with the OP's original code: treating second
as an alias of first
. 我们看到,实际上,GCC确实按照我对OP的原始代码所说的去做:将
second
作为first
的别名。
Tom Karzes has significantly modified the code, and so is experiencing a different issue. 汤姆·卡兹(Tom Karzes)已对代码进行了重大修改,因此遇到了另一个问题。 What he reports does appear to be a bug;
他报告的内容似乎是一个错误; I haven't time ATM to figure out what is really happening with his stack-corrupting assignment.
我还没来得及让ATM弄清楚他的堆栈损坏任务到底发生了什么。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.