简体   繁体   English

char数组和指针初始化语义

[英]char array and pointer initialization semantics

In the following code snippet 在以下代码段中

   char *str1 = "abcd";
   char str2[] = "defg";

I realize that the first statement stores the pointer to a string literal in the readonly section of the executable while the second one to a read write section. 我意识到,第一条语句将指向字符串文字的指针存储在可执行文件的只读部分中,而第二条语句将指针存储到读写部分中。 On examining the generated instructions I verify that the first one stores the pointer to "abcd" in rodata section to str1. 检查生成的指令后,我确认第一个指令在str1的rodata节中存储了指向“ abcd”的指针。

What was interesting was the second statement. 有趣的是第二个陈述。 The compiler inserted code to store values into 编译器插入代码以将值存储到

char *str1 = "abcd";
8048420:       c7 44 24 10 20 85 04    movl   $0x8048520,0x10(%esp)
8048427:       08
char str2[] = "defg";
8048428:       c7 44 24 17 64 65 66    movl   $0x67666564,0x17(%esp)
804842f:       67
8048430:       c6 44 24 1b 00          movb   $0x0,0x1b(%esp)

How does the compiler decide when to do which out of the following? 编译器如何决定何时执行以下操作?

  1. The string literal is stored in rodata section 字符串文字存储在rodata节中
  2. The string literal is stored in data section (rw ) 字符串文字存储在数据节(rw)中
  3. The string memory is implicit in the stack and the instructions are generated to fill the stack? 字符串存储器在堆栈中是隐式的,并且生成指令来填充堆栈吗?
  4. Are there any other possibilities as well and variations among hardware? 是否还有其他可能性以及硬件之间的差异?

Note: I am running an precise32 vagrant, gcc with debug symbols and -O0 注意:我正在运行带有调试符号和-O0的Precision32 vagrant,gcc

When an aggregate object in memory is initialized with a compile-time aggregate value (which is not limited to string literals), the compiler always has a choice 当使用编译时聚合值(不限于字符串文字)初始化内存中的聚合对象时,编译器始终可以选择

  1. Pre-build the complete initializer in read-only data section at compile time, and then just copy the whole thing into the modifiable target value by using memcpy at run time. 在编译时在只读数据部分中预先构建完整的初始化程序,然后在运行时使用memcpy将整个事情复制到可修改的目标值中。

  2. Generate code that will directly build the target value "in-place" piece-by-piece at run time. 生成将在运行时逐段直接构建目标值的代码。

Basically, the first is the "data-based" approach and the second is the "code-based" approach. 基本上,第一种是“基于数据”的方法,第二种是“基于代码的”方法。 In your case the compiler uses code-based solution, probably because the literal is short. 在您的情况下,编译器使用基于代码的解决方案,可能是因为文字简短。 Use a longer literal and, I suspect, it will eventually switch to the first approach. 使用更长的文字,我怀疑它将最终切换到第一种方法。

One can probably imagine that in some cases a mixed approach might be used by some compiler: part of the data is pre-build somewhere and memcpy -ed from there, the rest of the data is built on the fly. 一个可能会想到,在某些情况下,混合方法可能被一些编译器使用:部分数据是预编译的地方,并memcpy -ed分从那里,数据的其余部分是建立在飞。

If your 如果你的

char str2[] = "defg";

definition is inside a function, then the compiler will generate instructions to put the data on the stack (ignoring possible optimizations, eg keeping values purely in registers). 定义位于函数内部,然后编译器将生成指令以将数据放入堆栈(忽略可能的优化,例如,将值仅保留在寄存器中)。 This works just as for other automatic (stack) variables. 就像其他自动(堆栈)变量一样。

It also has the option of copying the data from somewhere else to the stack instead of eg having the data values as immediate operands to instructions. 它也可以选择将数据从其他地方复制到堆栈,而不是例如将数据值作为指令的立即操作数。 It might choose to do this for longer strings to avoid code bloat. 它可能选择对更长的字符串执行此操作,以避免代码膨胀。

Regardless of what the compiler does, modifications to the contents of str2 must not be visible by the next invocation of the function though (just as for other automatic variables). 无论编译器做什么,都不能在下一次调用该函数时看到对str2内容的修改(就像其他自动变量一样)。

If str2 is global (which gives it static storage duration), then the data will end up in the read/write data segment. 如果str2是全局的(为其提供了静态存储持续时间),则数据将最终位于读/写数据段中。 This also happens if you give the array static storage duration inside the function, as in 如果您在函数中为数组提供静态存储持续时间,也会发生这种情况,如下所示:

static char str2[] = "defg";

When initiliazing a pointer with a string literal, as in 用字符串文字初始化指针时,如

char *s = "defg";

, the data ends up in the read-only data segment, and the rules for how the pointer itself is initialized with the address of the data are the same as above. ,数据以只读数据段结尾,并且如何使用数据的地址初始化指针本身的规则与上述相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM