简体   繁体   English

C编程中的字符串混淆

[英]Confusion on strings in C programming

So i am learning to program c using the compiler Dev C++. 所以我正在学习使用编译器Dev C ++编程c。 Question 1: 问题1:

#include <stdio.h> 
#include <conio.h> //for the getch() function
#include <string.h> 

int main(void) 
{ 
    char line[3]; 
    strcpy(line, "Hello world"); 
    printf("%s", line); 
    getch(); 
} 

Output: Hello world 输出:Hello world

Why is it that it displays all of "Hello world" when i declared my string to only hold 3 characters? 当我声明我的字符串仅包含3个字符时,为什么它显示所有“ Hello world”?

Question 2: 问题2:

char line[3] = "Hello world"; 
printf("%s", line); 

Output: Hel 输出:Hel

Why is it that it displays "Hel"? 为什么显示“ Hel”? Shouldnt it display only "He" since line[0] = H, line[1] = e and line[2] = '\\0'? 由于line [0] = H,line [1] = e和line [2] ='\\ 0',它不应该只显示“ He”吗? And the %s works by searching for a '\\0'? 并且%s通过搜索'\\ 0'起作用吗?

Please help me understand whats really happening. 请帮助我了解实际情况。 Thanks! 谢谢!

Please help me understand whats really happening. 请帮助我了解实际情况。

Undefined behaviour! 不确定的行为!

When you do this, you've a buffer overrun : 执行此操作时, 缓冲区溢出

char line[3]; 
strcpy(line, "Hello world"); 

Why is it that it displays all of "Hello world" when i declared my string to only hold 3 characters? 当我声明我的字符串仅包含3个字符时,为什么它显示所有“ Hello world”?

You're copying more than the size of the allocated array. 您正在复制的内容超出了分配的数组的大小。 This is undefined behaviour and thus any output is possible, including but not limited to, calling aunt Tilda, formatting your hard disk, etc. :) See here for more. 这是未定义的行为,因此任何输出都是可能的,包括但不限于,呼叫Tilda姨妈,格式化硬盘等。:) 有关更多信息, 请参见此处


char line[3] = "Hello world"; 
printf("%s", line); 

Here you've a buffer over-read ! 在这里,您有一个缓冲区超读 Refer to alk's answer on why only 3 characters would get copied to line . 请参阅alk的答案 ,为什么只有3个字符会被复制到line

Why is it that it displays "Hel"? 为什么显示“ Hel”? Shouldnt it display only "He" 它不应该只显示“ He”

No, it can display anything, again because of undefined behaviour. 不,它又可以显示任何内容,同样是由于行为不确定。 See what output I get on my machine: 查看我在计算机上得到的输出:

Hel☻ 海尔

This is undefined behaviour because printf expects you to have a null-terminated string, yes, but that doesn't mean you can access beyond the size of an array ie you've an array like this in memory 这是未定义的行为,因为printf希望您有一个以空值结尾的字符串,是的,但这并不意味着您可以访问超出数组大小的内存,即您在内存中拥有这样的数组

  [0] [1] [2] ----------------------------------------------- . . . █ | █ | █ | H | e | l | █ | █ | █ | . . . ----------------------------------------------- <-- line ---> 

Any thing written as █ above is an unknown value, not under your powers and thus accessing them is undefined. 上面写为█的任何东西都是未知值,不在您的权力之下,因此访问它们是不确定的。 However, %s in printf expects a null-terminated string and thus, under your orders, it reads beyond what's allowed (what is allowed is just three elements till l ). 但是, printf %s期望以空字符结尾的字符串,因此,在您的命令下,它读取的内容超出了所允许的范围(直到l为止,所允许的只是三个元素)。 In my case \\0 appeared one element after l (the smiley), while in your case it's just after l thus appearing correct but only by luck, it may well appear 1000 elements later. 在我的情况下, \\0l (笑脸)之后出现了一个元素,而在您的情况下,它恰好在l之后出现,因此只是靠运气,它很可能以后会出现1000个元素。


If you really want to print the char array, which is not null-terminated, only up to the allowed limit, you could do one of these without hitting any undefined behaviours. 如果您真的要打印不以null终止的char数组,但只能打印到允许的限制,则可以执行其中之一而不会遇到任何未定义的行为。

printf("%.3s", line);       // length specified at compile-time

printf("%.*s", 3, line);    // length fed at run-time

See here for further information. 有关更多信息, 请参见此处

Referring Question 2: 提及问题2:

When using a "string"-literal as initialiser, the 0 -terminator is applied only if there's room for it. 当使用“字符串”字面量作为初始化程序时,仅在有空间的情况下应用0终止符。

From the C99-Standard : C99-Standard

6.7.8 Initialization 6.7.8初始化

[...] [...]

14 An array of character type may be initialized by a character string literal, optionally enclosed in braces. 14字符类型数组可以由字符串文字初始化,并可选地用大括号括起来。 Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array. 字符串文字的连续字符(如果有空间或数组大小未知,则包括终止空字符)将初始化数组的元素。

The both examples of programs have undefined behaviour. 这两个程序示例均具有未定义的行为。 In the first example you overwrite the memory beyond the array. 在第一个示例中,您将覆盖数组之外的内存。 In the second example C does not allow to use more initializers than an object can accepts. 在第二个示例中,C不允许使用超出对象可以接受的数量的初始化程序。

2 No initializer shall attempt to provide a value for an object not contained within the entity being initialized. 2初始化程序不得尝试为未包含在正在初始化的实体内的对象提供值。

The only exclusion is done for character arrays that are allowed to ignore the terminating zero 唯一的排除是针对允许忽略终止零的字符数组进行的

14 An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. 14字符类型的数组可以由字符串文字或UTF-8字符串文字初始化,并可选地用大括号括起来。 Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array. 字符串文字的连续字节(如果有空间或数组大小未知,则包括终止空字符)将初始化数组的元素。

So the second code-snippet shall not be compiled or at least the compiler shall issue a diagnostic message. 因此,不应编译第二个代码片段,或者至少编译器应发出诊断消息。

Why is it that it displays all of "Hello world" when i declared my string to only hold 3 characters? 当我声明我的字符串仅包含3个字符时,为什么它显示所有“ Hello world”?

Because printf() reads a string upto a null terminator. 因为printf()读取的字符串最多为空终止符。 It doesn't know how big the storage is, and neither does strcpy() ; 它不知道存储空间有多大, strcpy()也不知道; if you want to make sure the copy doesn't exceed the length of the storage, use strncpy() (notice the n in the middle). 如果要确保副本不超过存储空间的长度,请使用strncpy() (注意中间的n )。

Why is it that it displays "Hel"? 为什么显示“ Hel”?

There doesn't have to be a explanation for this, since you've already overflowed the buffer -- this could have any kind of bizarre consequence for the program. 不必对此进行解释,因为您已经溢出了缓冲区-这可能会对程序产生任何奇怪的结果。 You could have overwritten something else (and conversely, your data might get overwritten subsequently). 您可能已经覆盖了其他内容(相反,您的数据随后可能会被覆盖)。 If you break the rules, you are most likely invoking "undefined behaviour". 如果您违反规则,则很可能会调用“未定义的行为”。

It could be in this case that the compiler only wrote 3 characters because of the form of the initialization, but that is not something to count on -- there aren't necessarily rules for what happens when you break the rules. 在这种情况下,由于初始化的形式,编译器可能只写了3个字符,但这并不是值得指望的事情-违反规则并不一定要遵循规则。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM