简体   繁体   English

为什么 C 字符串(字符数组)有时会在 C 中合并在一起或打印错误?

[英]Why are C-strings (char arrays) getting merged together or printing erroneously sometimes in C?

I have been working with strings in C.我一直在使用 C 中的字符串。 While working with ways to declare them and initialize them, I found some weird behavior I don't understand.在使用声明和初始化它们的方法时,我发现了一些我不理解的奇怪行为。

#include<stdio.h>
#include<string.h>

int main()
{
    char str[5] = "World";
    char str1[] = "hello";
    char str2[] = {'N','a','m','a','s','t','e'};
    char* str3 = "Hi";

    printf("%s %zu\n"
           "%s %zu\n"
           "%s %zu\n"
           "%s %zu\n",
           str, strlen(str),
           str1, strlen(str1),
           str2, strlen(str2),
           str3, strlen(str3));

    return 0;
}

Sample output:样品 output:

Worldhello 10
hello 5
Namaste 7
Hi 2

In some cases, the above code makes str contain Worldhello , and the rest are as they were intialized.在某些情况下,上面的代码使str包含Worldhello ,并且 rest 与它们被初始化时一样。 In some other cases, the above code makes str2 contain Namastehello .在其他一些情况下,上面的代码使str2包含Namastehello It happens with different variables I never concatenated.它发生在我从未连接过的不同变量上。 So, how are they are getting combined?那么,它们是如何结合起来的呢?

To work with strings, you must allow space for a null character at the end of each string.要使用字符串,您必须在每个字符串的末尾为 null 字符留出空间。 Where you have char str[5]="World";你有char str[5]="World"; , you allow only five characters, and the compiler fills them with “World”, but there is no space for a null character after them. ,您只允许五个字符,并且编译器用“World”填充它们,但是在它们之后没有空格用于 null 字符。 Although the string literal "World" includes an automatic null character at its end, you did not provide space for it in the array, so it is not copied.尽管字符串文字"World"在其末尾包含一个自动 null 字符,但您没有在数组中为它提供空间,因此它不会被复制。

Where you have char str1[]="hello";你有char str1[]="hello"; , the compiler determines the array size by counting the characters, including the null character at the end of the string literal. ,编译器通过计算字符来确定数组大小,包括字符串文字末尾的 null 字符。

Where you have char str2[]={'N','a','m','a','s','t','e'};你有 char str2[]={'N','a','m','a','s','t','e'}; , there is no string literal, just a list of individual characters. ,没有字符串文字,只有单个字符的列表。 The compiler determines the array size by counting those.编译器通过计算这些来确定数组大小。 Since there is no null character, it does not provide space for it.由于没有 null 字符,因此不为其提供空间。

One potential consequence of failing to terminate a string with a null character is that printf will continue reading memory beyond the string and printing characters from the values it finds.未能使用 null 字符终止字符串的一个潜在后果是printf将继续读取 memory 超出字符串并从它找到的值中打印字符。 When the compiler has placed other character arrays after such an array you are printing, characters from those arrays may appear in the output.当编译器在您打印的此类数组之后放置其他字符 arrays 时,来自这些 arrays 的字符可能会出现在 output 中。

If you allow space for a null character in str and provide a zero value in str2 , your program will print strings in an orderly way:如果您在str中为 null 字符留出空间并在str2中提供零值,您的程序将以有序的方式打印字符串:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char str[6] = "World"; // 5 letters plus a null character.
    char str1[] = "hello";
    char str2[] = {'N', 'a', 'm', 'a', 's', 't', 'e',  0}; // Include a null.
    char *str3 = "Hi";
    printf("%s %zu\n%s %zu\n%s %zu\n%s %zu\n",
        str,  strlen(str),
        str1, strlen(str1),
        str2, strlen(str2),
        str3, strlen(str3));
    return 0;
}

Undefined behavior in non -null-terminated, adjacently-stored C-strings空终止、相邻存储的 C 字符串中的未定义行为

Why do you get this part:你为什么得到这部分:

Worldhello 10
hello 5

...instead of this? ...而不是这个?

World 5
hello 5

The answer is that printf() prints chars until it hits a null character, which is a binary zero, frequently written as the '\0' char.答案是printf()打印字符直到遇到 null 字符,这是一个二进制零,通常写为'\0'字符。 And, the compiler happens to have placed the character array containing hello right after the character array containing World .而且,编译器恰好将包含hello的字符数组放在包含World的字符数组之后。 Since you explicitly forced the size of str to be 5 via str[5] , the compiler was unable to fit the automatic null character at the end of the string.由于您通过str[5]明确强制str的大小为5 ,因此编译器无法在字符串末尾放置自动 null 字符。 So, with hello happening to be ( not guaranteed to be) right after World , and printf() printing until it sees a binary zero, it printed World , saw no terminating null char, and continued right on into the hello string right after it.因此, hello恰好在World之后(保证是),并且printf()打印直到看到二进制零,它打印World ,没有看到终止 null 字符,并在它之后继续进入hello字符串. This resulted in it printing Worldhello , and then stopping only when it saw the terminating character after hello , which string is properly terminated.这导致它打印Worldhello ,然后仅在它看到hello之后的终止字符时才停止,该字符串正确终止。

This code relies on undefined behavior, which is a bug.此代码依赖于未定义的行为,这是一个错误。 It cannot be relied upon.它不能被依赖。 But, that is the explanation for this case.但是,这就是本案的解释。

Run it with gcc on a 64-bit Linux machine online here: Online GDB: undefined behavior in NON null-terminated C strings使用 gcc 在 64 位 Linux 机器上在线运行它:在线 GDB: undefined behavior in NON null-terminated strings14D1250F8370141D4D4

@Eric Postpischil has a great answer and provides more insight here . @Eric Postpischil有一个很好的答案,并在此处提供了更多见解

From the C tag wiki :来自C 标签维基

This tag should be used with general questions concerning the C language, as defined in the ISO 9899 standard (the latest version, 9899:2018, unless otherwise specified — also tag version-specific requests with c89, c99, c11, etc).此标签应用于有关 C 语言的一般问题,如 ISO 9899 标准中所定义(最新版本,9899:2018,除非另有说明 - 还使用 c89、c99、c11 等标记特定于版本的请求)。

You've asked a "how?"你问过一个“怎么样?” question about something that none of those documents defines, and so the answer is undefined in the context of C.关于这些文档都没有定义的问题,因此在 C 的上下文中未定义答案。 You can only experience this phenomenon through undefined behaviour.您只能通过未定义的行为来体验这种现象。

how are they are getting combined?他们是如何结合起来的?

There is no such requirement that any of these variables are "combined" or are immediately located after each other;没有要求将这些变量中的任何一个“组合”或紧随其后; trying to observe that is undefined behaviour.试图观察这是未定义的行为。 It may appear to coincidentally work (whatever that means) for you at times on your machine, while failing at other times or using some other machine or compiler, etc. That's purely coincidental and not to be relied upon.它有时可能会巧合地在您的机器上为您工作(无论这意味着什么),而在其他时间失败或使用其他机器或编译器等。这纯属巧合,不可依赖。

In some cases, the above code assigns str with Worldhello and the rest as they were intitated.在某些情况下,上面的代码会在初始化时为 str 分配 Worldhello 和 rest。

In the context of undefined behaviour, it makes no sense to make claims about how your code functions, as you've already noticed, the functionality is erratic.在未定义行为的上下文中,声明代码的功能是没有意义的,正如您已经注意到的那样,功能是不稳定的。

I found some weird Behaviour with them.我发现他们有一些奇怪的行为。

If you want to prevent erratic behaviour, stop invoking undefined behaviour by accessing arrays out of bounds (ie causing strlen to run off the end of an array).如果您想防止异常行为,请通过越界访问 arrays(即导致strlen超出数组末尾)来停止调用未定义的行为。

Only one of those variables is safe to pass to strlen ;只有其中一个变量可以安全地传递给strlen you need to ensure the array contains a null terminator.您需要确保阵列包含 null 终结器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM