简体   繁体   中英

Are there any internal differences between compound char arrays and string literals?

Is there any internal differences between this:

(const char[]){'H', 'e', 'l', 'l', 'o', 0}

and

"Hello"

in C?
I have noticed that string literals declared with quotes are gotten by the strings unix command, but the others aren't. What are the internal differences that cause that?

String literals always have static storage duration, whereas compound literals at block scope have automatic storage duration.

String literals have special status in initializers, eg

char x[5] = "foo";  // ok
char y[5] = (char[]){'f','o','o','\0'};   // error

String literals do not necessarily have unique addresses, eg "foo" == "foo" may or may not be true, and "foo" + 1 == "oo" may or may not be true, whereas similar comparison for compound literals must be false.

It is probably because the strings command looks for string literals within the object file and when you declare a string with char *str="Hello" , your compiler saves it as a string literal and labels it. GCC shows this memory location with the .string keyword, whereas clang uses .asciz .

On the other hand, when you declare the string as an array of chars, they are stored in the memory as consecutive bytes and are not marked with .string or '.asciz`

Take this simple function

int test() {
    (const char[]){'H', 'e', 'l', 'l', 'o', 0};
    char *str="Hello";
}

below is the code generated by GCC(without any code optimisations). clang gives a similar code but the one from GCC is simpler, so I am using it for the sake of simplicity

.LC0:
        .string "Hello"
test:
      push    rbp                                 ; function prologue
      mov     rbp, rsp                            ; function prologue
      mov     BYTE PTR [rbp-14], 72               ; ASCII H
      mov     BYTE PTR [rbp-13], 101              ; ASCII e
      mov     BYTE PTR [rbp-12], 108              ; ASCII l
      mov     BYTE PTR [rbp-11], 108              ; ASCII l
      mov     BYTE PTR [rbp-10], 111              ; ASCII o
      mov     BYTE PTR [rbp-9], 0                 ; ASCII null 
      mov     QWORD PTR [rbp-8], OFFSET FLAT:.LC0 ; string literal
      nop
      pop     rbp
      ret

After the function prologue , the bytes are moved to memory one by one but are not marked as a string. Whereas as your char *str="Hello"; is marked with a label and I assume the strings utility just searches for these literals.

String literals are often stored in a read-only section of memory, while compound literal char arrays are not.

For example, given this code:

char *x = (char *)(const char[]){'H', 'e', 'l', 'l', 'o', 0};
char *y = "Hello";
x[0] = 'X';
printf("x=%s\n", x);
y[0] = 'X';
printf("y=%s\n", y);

The modification to x[0] is successful and the modified string is printed, while the attempted change to y[0] results in a segfault.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM