简体   繁体   English

如何比较 const char* 字符串?

[英]How const char* strings are compared?

Firstly, consider this example:首先,考虑这个例子:

#include <iostream>
using namespace std;

int main()
{
    cout << ("123" == "123");
}

What do I expect: since "123" is a const char* , I expect ADDRESSES ( like one of these answers said ) of these strings to be compared.我期望什么:由于 "123" 是一个const char* ,我希望这些字符串的 ADDRESSES (就像其中一个答案所说的那样)进行比较。

... because != and == will only compare the base addresses of those strings. ...因为!===只会比较这些字符串的基地址。 Not the contents of the strings themselves.不是字符串本身的内容。

But still the output is 1 .但 output 仍然是1 Okay, we actually don't know how to compare addresses of two prvalue objects (or at least I don't understand how it would be done).好吧,我们实际上不知道如何比较两个纯右值对象的地址(或者至少我不明白它是如何完成的)。 So let's declare these strings as variables and see what will happen:所以让我们将这些字符串声明为变量,看看会发生什么:

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1230";
    cout << (a == b);
}

Still the output is 1 . output 仍然是1 So const char* strings does not decay?所以const char*字符串不会衰减? Or compiler managed to do some optimizations and allocate memory only for one string?还是编译器设法进行一些优化并仅为一个字符串分配 memory? Ok, let's try to avoid them:好的,让我们尽量避免它们:

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    b = "1230";
    cout << (a == b);
}

Still the result is the same.结果还是一样。 Which made me think that const char* really does not decays.这让我觉得const char*真的不会衰减。 But that didn't made my life simpler.但这并没有让我的生活变得更简单。 How then const char* s are compared?那么如何比较const char* s?

Why here the output is 1 :为什么这里 output 是1

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    cout << (a > b);
}

a is less than b , in terms of lexographical comparison, but here a is bigger. a小于b ,就字典比较而言,但这里a更大。 How then comparison of const char* s is implemented?那么如何实现const char* s 的比较呢?

Yes, the linked answer is correct.是的,链接的答案是正确的。 operator== for pointers just compares the addresses, never their content.指针的operator==只比较地址,而不是它们的内容。

Furthermore, the compiler is free, but not required, to de-duplicate string literals, so all occurrences of a string literal are the same object, with the same address.此外,编译器是免费的,但不是必需的,去重复字符串文字,所以字符串文字的所有出现都是相同的 object,具有相同的地址。 That is what you observe and re-assignment b = "1230";这就是你观察到的并重新分配b = "1230"; won't stop that.不会停止的。

[lex.string.14] Evaluating a string-literal results in a string literal object with static storage duration, initialized from the given characters as specified above. [lex.string.14]评估字符串文字会产生字符串文字 object 和 static 存储持续时间,从上面指定的给定字符初始化。 Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.所有字符串文字是否不同(即,存储在不重叠的对象中)以及字符串文字的连续评估是否产生相同或不同的 object 未指定。

What should const char* decay to? const char*应该衰减到什么程度? Arrays decay, pointers don't. Arrays 衰减,指针不会。

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    cout << (a > b);
}

returns 1 just because a happens to point to a higher address than b , there is no lexiographical comparison done.返回1只是因为a恰好指向比b更高的地址,因此没有进行字典比较。 Just use std::string or std::string_view if you require that.如果需要,只需使用std::stringstd::string_view

The storage details of literal character strings is completely unspecified by the C++ standard (except for their lifetime) and is entirely up to the compiler's discretion. C++ 标准完全未指定文字字符串的存储细节(除了它们的生命周期),完全由编译器自行决定。 For example:例如:

const char *a="ABCDEFG";
const char *b="DEFG";

It is entirely possible for a smart compiler to produce only one string out of this, and set the 2nd pointer to point to the middle of the string.智能编译器完全有可能只生成一个字符串,并将第二个指针设置为指向字符串的中间。

It is also possible for the same literal character strings that come from different .cpp files to produce just a single string in the final, linked executable and both strings, that were originally compiled in different .cpp entirely, to end up having the same actual pointer value.来自不同.cpp文件的相同文字字符串也可能在最终的、链接的可执行文件中仅生成一个字符串,而这两个字符串最初完全在不同的.cpp中编译,最终具有相同的实际指针值。

Similarly, pointer comparison is also implementation defined for all other cases that are not explicitly specified in the C++ standard.同样,指针比较也是针对 C++ 标准中未明确指定的所有其他情况定义的实现。 Pointer comparison has a defined behavior mostly for pointers to the members of the same array or vector, and in general is completely unspecified otherwise.指针比较具有定义的行为,主要用于指向同一数组或向量的成员的指针,并且通常完全未指定其他方式。 There are ways to implement total order for pointers, in the C++ standard, but that's not relevant here.在 C++ 标准中,有一些方法可以实现指针的总顺序,但这与此处无关。

To summarize: you cannot expect any specific behavior or particular meaning to any pointer values, otherwise.总结一下:否则,您不能期望任何指针值有任何特定行为或特定含义。

In this comparison在这个比较中

"123" == "123"

the string literals having the type const char[4] are implicitly converted to pointers to their first elements and these pointers are compared.具有const char[4]类型的字符串文字被隐式转换为指向它们的第一个元素的指针,并比较这些指针。

The result depends on compiler options that specify whether identical string literals stored as one string literal or as separate string literals.结果取决于编译器选项,这些选项指定相同的字符串文字是存储为一个字符串文字还是单独的字符串文字。

As for this program至于这个节目

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1231";
    cout << (a > b);
}

then you may not use the operator > with pointers that do not point to elements of the same array.那么您不能将运算符 > 与不指向同一数组元素的指针一起使用。 Such a comparison is undefined.这样的比较是不确定的。

The result of the comparison depends on in which order the compiler places the string literals in the string literal pool.比较的结果取决于编译器将字符串文字放在字符串文字池中的顺序。

I expect ADDRESSES (like one of these answers said) of these strings to be compared.我希望对这些字符串的 ADDRESSES (就像其中一个答案所说的那样)进行比较。

Correct, that is what happens in both C and C++.正确,这就是 C 和 C++ 中发生的情况。 When C-strings (char arrays) or string literals are compared in C and C++, the compiler shall compare only their addresses.在 C 和 C++ 中比较 C 字符串(字符数组)或字符串文字时,编译器仅比较它们的地址。

Or compiler managed to do some optimizations and allocate memory only for one string?还是编译器设法进行一些优化并仅为一个字符串分配 memory?

Yes.是的。 Precisely.恰恰。 The compiler sees "1230" twice and may (in your/our case, does, which is why we see this behavior) just use the same exact string at the same exact memory location for both of them in the code below.编译器看到"1230"两次,并且可能(在您/我们的情况下,确实如此,这就是我们看到这种行为的原因)只是在下面的代码中为它们两个在相同的确切 memory 位置使用相同的确切字符串。 Therefore, they have the same address.因此,它们具有相同的地址。 This is a nice optimization the C and C++ compilers may make for you.这是 C 和 C++ 编译器可以为您做的一个很好的优化。

#include <iostream>
using namespace std;

int main()
{
    const char* a = "1230";
    const char* b = "1230";
    cout << (a == b);
}

Going further:更进一步:

The fact that that optimization is made for you means that you can happily write things like the following, even on memory-constrained embedded systems, knowing that the program space used up does not increase by the size of the string literal each time you use the string literal:为您进行优化这一事实意味着您可以愉快地编写如下内容,即使在内存受限的嵌入式系统上,也知道每次使用字符串字面量:

printf("some very long string\n");
printf("some very long string\n");
printf("some very long string\n");
printf("some very long string\n");

"some very long string" is only stored in memory one single time. "some very long string"仅在 memory 中存储一次。

That being said, if you make even a single character change to that string, the compiler may choose to make it become a new string in memory, so in the case above you're better off doing this anyway:话虽如此,如果您对该字符串进行了单个字符更改,编译器可能会选择使其成为 memory 中的新字符串,因此在上述情况下,您最好还是这样做:

constexpr char MY_MESSAGE[] = "some very long string\n";
// OR:
// #define MY_MESSAGE "some very long string\n"

printf(MY_MESSAGE);
printf(MY_MESSAGE);
printf(MY_MESSAGE);
printf(MY_MESSAGE);

See also:也可以看看:

  1. Why do (only) some compilers use the same address for identical string literals? 为什么(仅)某些编译器对相同的字符串文字使用相同的地址?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM