简体   繁体   中英

char array and pointer initialization semantics

In the following code snippet

   char *str1 = "abcd";
   char str2[] = "defg";

I realize that the first statement stores the pointer to a string literal in the readonly section of the executable while the second one to a read write section. On examining the generated instructions I verify that the first one stores the pointer to "abcd" in rodata section to str1.

What was interesting was the second statement. The compiler inserted code to store values into

char *str1 = "abcd";
8048420:       c7 44 24 10 20 85 04    movl   $0x8048520,0x10(%esp)
8048427:       08
char str2[] = "defg";
8048428:       c7 44 24 17 64 65 66    movl   $0x67666564,0x17(%esp)
804842f:       67
8048430:       c6 44 24 1b 00          movb   $0x0,0x1b(%esp)

How does the compiler decide when to do which out of the following?

  1. The string literal is stored in rodata section
  2. The string literal is stored in data section (rw )
  3. The string memory is implicit in the stack and the instructions are generated to fill the stack?
  4. Are there any other possibilities as well and variations among hardware?

Note: I am running an precise32 vagrant, gcc with debug symbols and -O0

When an aggregate object in memory is initialized with a compile-time aggregate value (which is not limited to string literals), the compiler always has a choice

  1. Pre-build the complete initializer in read-only data section at compile time, and then just copy the whole thing into the modifiable target value by using memcpy at run time.

  2. Generate code that will directly build the target value "in-place" piece-by-piece at run time.

Basically, the first is the "data-based" approach and the second is the "code-based" approach. In your case the compiler uses code-based solution, probably because the literal is short. Use a longer literal and, I suspect, it will eventually switch to the first approach.

One can probably imagine that in some cases a mixed approach might be used by some compiler: part of the data is pre-build somewhere and memcpy -ed from there, the rest of the data is built on the fly.

If your

char str2[] = "defg";

definition is inside a function, then the compiler will generate instructions to put the data on the stack (ignoring possible optimizations, eg keeping values purely in registers). This works just as for other automatic (stack) variables.

It also has the option of copying the data from somewhere else to the stack instead of eg having the data values as immediate operands to instructions. It might choose to do this for longer strings to avoid code bloat.

Regardless of what the compiler does, modifications to the contents of str2 must not be visible by the next invocation of the function though (just as for other automatic variables).

If str2 is global (which gives it static storage duration), then the data will end up in the read/write data segment. This also happens if you give the array static storage duration inside the function, as in

static char str2[] = "defg";

When initiliazing a pointer with a string literal, as in

char *s = "defg";

, the data ends up in the read-only data segment, and the rules for how the pointer itself is initialized with the address of the data are the same as above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM