简体   繁体   中英

Explanation about the string compare differences between char* and array strings

Okay, i thought i know everything about pointers and memory operations but one thing is curious to me. I've been comparing strings only with strcmp so far but ..

This expression is correct:

#include <stdio.h>

int main()
{
    char* str1 = "I love StackOverflow"; // dram memory alocated
    char* str2 = "I love StackOverflow";

    if(str1 == str2) printf("%s and %s are equal", str1, str2);
    else printf("%s and %s are not equal", str1, str2);

    return 1;
}

Which should perform comparison between each of the memory blocks of str1 and str2? In this case.. if we use:

char str1[] = "I love StackOverflow"; // saving them on stack
char str2[] = "I love StackOverflow";

instead, it won't output they are equal. Why?

In the first example there is absolutely no guarantee the two pointers are equal. It is an optimization performed by your compiler exploiting the fact that string literals are immutable in C.

C99 Rationale document says:

"This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform certain optimizations"

You should not rely on this and if you want to compare strings, either in your first or in your second code snippet, use strcmp / strncmp functions.

Some compiler try to reduce memory requirements by storing single copies of identical string literals.

As in your case, compiler might choose to store "I love StackOverflow" just once, making both str1 and str2 points to it. So, when you are comparing str1 == str2 , basically you are comparing pointer to first element of the string literals ( not the string itself ) which may be pointing to the same location as stated above and hence giving the result that both string literals are equal. You can't rely on it.

In case of char* variables compiler may perofm string 'unification' (sorry, I'm not sure how it is actually called) which means it detects same string constants and allocates each of them only once. That means the first code is compiled as

char common_string_detected_1[] = "I love StackOverflow";

char* str1 = common_string_detected_1;
char* str2 = common_string_detected_1;

and str1 and str2 contain the same pointer, being the addres of 'I' in the array.

I the latter case you explicitly declare two arrays and compiler keeps them separate.

I can show you with assembly listings, compiled on gcc;

c++

char* str1 = "I love StackOverflow";
char* str2 = "I love StackOverflow";

if(str1 == str2) printf("%s and %s are equal", str1, str2);
else printf("%s and %s are not equal", str1, str2);

asm

LC0: // LC0 - LC1 - LC2 these are labels
    .ascii "I love StackOverflow\0"
LC1:
    .ascii "%s and %s are equal\0"
LC2:
    .ascii "%s and %s are not equal\0"

...

mov DWORD PTR [esp+28], OFFSET FLAT:LC0
mov DWORD PTR [esp+24], OFFSET FLAT:LC0 // moves exact same address into stack
mov eax, DWORD PTR [esp+28] // immediately moves one of them into eax
cmp eax, DWORD PTR [esp+24] // now compares the exact same addresses (LC0)
jne L2 // (jump if not equal)
// followed by code that prints if equal then L2 label(followed by code that prints if not equal)

now using []

LC0:
    .ascii "%s and %s are not equal\0"

...

mov DWORD PTR [esp+43], 1869357129
mov DWORD PTR [esp+47], 1394632054
mov DWORD PTR [esp+51], 1801675124
mov DWORD PTR [esp+55], 1919252047
mov DWORD PTR [esp+59], 2003790950
mov BYTE PTR [esp+63], 0
mov DWORD PTR [esp+22], 1869357129
mov DWORD PTR [esp+26], 1394632054
mov DWORD PTR [esp+30], 1801675124
mov DWORD PTR [esp+34], 1919252047
mov DWORD PTR [esp+38], 2003790950
mov BYTE PTR [esp+42], 0

lea eax, [esp+22]
mov DWORD PTR [esp+8], eax
lea eax, [esp+43]
mov DWORD PTR [esp+4], eax 
// loads the effective address off the stack of the data for both strings
// notice these two address are different, because both strings sit in different places on the stack
// it doesn't even bother comparing them and has removed the "is equal" string
mov DWORD PTR [esp], OFFSET FLAT:LC0
call    _printf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM