简体   繁体   English

关于char *和数组字符串之间的字符串比较差异的说明

[英]Explanation about the string compare differences between char* and array strings

Okay, i thought i know everything about pointers and memory operations but one thing is curious to me. 好吧,我以为我知道关于指针和内存操作的一切,但有一点让我好奇。 I've been comparing strings only with strcmp so far but .. 到目前为止,我一直在用strcmp比较字符串,但是..

This expression is correct: 这个表达是正确的:

#include <stdio.h>

int main()
{
    char* str1 = "I love StackOverflow"; // dram memory alocated
    char* str2 = "I love StackOverflow";

    if(str1 == str2) printf("%s and %s are equal", str1, str2);
    else printf("%s and %s are not equal", str1, str2);

    return 1;
}

Which should perform comparison between each of the memory blocks of str1 and str2? 哪个应该在str1和str2的每个内存块之间进行比较? In this case.. if we use: 在这种情况下..如果我们使用:

char str1[] = "I love StackOverflow"; // saving them on stack
char str2[] = "I love StackOverflow";

instead, it won't output they are equal. 相反,它不会输出它们是相等的。 Why? 为什么?

In the first example there is absolutely no guarantee the two pointers are equal. 在第一个例子中,绝对不能保证两个指针是相等的。 It is an optimization performed by your compiler exploiting the fact that string literals are immutable in C. 它是由编译器利用字符串文字在C中不可变的事实执行的优化。

C99 Rationale document says: C99理由文件说:

"This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform certain optimizations" “此规范允许实现共享具有相同文本的字符串副本,将字符串文字放在只读内存中,并执行某些优化”

You should not rely on this and if you want to compare strings, either in your first or in your second code snippet, use strcmp / strncmp functions. 你不应该依赖于这个,如果你想在第一个或第二个代码片段中比较字符串,请使用strcmp / strncmp函数。

Some compiler try to reduce memory requirements by storing single copies of identical string literals. 某些编译器尝试通过存储相同字符串文字的单个副本来减少内存需求。

As in your case, compiler might choose to store "I love StackOverflow" just once, making both str1 and str2 points to it. 在你的情况下,编译器可能只选择存储"I love StackOverflow"一次,使str1str2指向它。 So, when you are comparing str1 == str2 , basically you are comparing pointer to first element of the string literals ( not the string itself ) which may be pointing to the same location as stated above and hence giving the result that both string literals are equal. 因此,当您比较str1 == str2 ,基本上您将指针与字符串文字的第一个元素( 不是字符串本身 )进行比较,这可能指向上述相同的位置,从而得出两个字符串文字都是等于。 You can't rely on it. 你不能依赖它。

In case of char* variables compiler may perofm string 'unification' (sorry, I'm not sure how it is actually called) which means it detects same string constants and allocates each of them only once. char*变量的情况下,编译器可以perofm字符串'unification'(抱歉,我不确定它是如何实际调用的),这意味着它检测到相同的字符串常量并且只分配它们一次。 That means the first code is compiled as 这意味着第一个代码被编译为

char common_string_detected_1[] = "I love StackOverflow";

char* str1 = common_string_detected_1;
char* str2 = common_string_detected_1;

and str1 and str2 contain the same pointer, being the addres of 'I' in the array. 并且str1str2包含相同的指针,是数组中“I”的地址。

I the latter case you explicitly declare two arrays and compiler keeps them separate. 在后一种情况下,您显式声明了两个数组,并且编译器将它们分开。

I can show you with assembly listings, compiled on gcc; 我可以向你展示汇编列表,在gcc上编译;

c++ C ++

char* str1 = "I love StackOverflow";
char* str2 = "I love StackOverflow";

if(str1 == str2) printf("%s and %s are equal", str1, str2);
else printf("%s and %s are not equal", str1, str2);

asm ASM

LC0: // LC0 - LC1 - LC2 these are labels
    .ascii "I love StackOverflow\0"
LC1:
    .ascii "%s and %s are equal\0"
LC2:
    .ascii "%s and %s are not equal\0"

...

mov DWORD PTR [esp+28], OFFSET FLAT:LC0
mov DWORD PTR [esp+24], OFFSET FLAT:LC0 // moves exact same address into stack
mov eax, DWORD PTR [esp+28] // immediately moves one of them into eax
cmp eax, DWORD PTR [esp+24] // now compares the exact same addresses (LC0)
jne L2 // (jump if not equal)
// followed by code that prints if equal then L2 label(followed by code that prints if not equal)

now using [] 现在使用[]

LC0:
    .ascii "%s and %s are not equal\0"

...

mov DWORD PTR [esp+43], 1869357129
mov DWORD PTR [esp+47], 1394632054
mov DWORD PTR [esp+51], 1801675124
mov DWORD PTR [esp+55], 1919252047
mov DWORD PTR [esp+59], 2003790950
mov BYTE PTR [esp+63], 0
mov DWORD PTR [esp+22], 1869357129
mov DWORD PTR [esp+26], 1394632054
mov DWORD PTR [esp+30], 1801675124
mov DWORD PTR [esp+34], 1919252047
mov DWORD PTR [esp+38], 2003790950
mov BYTE PTR [esp+42], 0

lea eax, [esp+22]
mov DWORD PTR [esp+8], eax
lea eax, [esp+43]
mov DWORD PTR [esp+4], eax 
// loads the effective address off the stack of the data for both strings
// notice these two address are different, because both strings sit in different places on the stack
// it doesn't even bother comparing them and has removed the "is equal" string
mov DWORD PTR [esp], OFFSET FLAT:LC0
call    _printf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM