简体   繁体   English

C字符数组及其长度

[英]C character array and its length

I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). 现在,我正在使用“ C编程绝对入门指南”(第3版)研究C,并且写道,所有字符数组的大小都应等于string length + 1 (即字符串终止长度为零)。 But this code: 但是这段代码:

#include <stdio.h>
main()
{
    char name[4] = "Givi";
    printf("%s\n",name);
    return 0;
}

outputs Givi and not Giv . 输出Givi而不是Giv Array size is 4 and in that case it should output Giv , because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4 . 数组大小为4 ,在这种情况下,应该输出Giv ,因为4(字符串长度)+1(字符串终止零字符长度)= 5,并且字符数组大小仅为4

Why does my code output Givi and not Giv ? 为什么我的代码输出Givi而不输出Giv

I am using MinGW 4.9.2 SEH for compilation. 我正在使用MinGW 4.9.2 SEH进行编译。

You are hitting what is considered to be undefined behavior . 您正在击中被认为是未定义的行为 It's working now, but due to chance, not correctness. 它现在正在运行,但是由于偶然性,而不是正确性。

In your case, it's because the memory in your program is probably all zeroed out at the beginning. 在您的情况下,这是因为程序中的内存可能在一开始就全部清零了。 So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop. 因此,即使您的字符串未正确终止,也恰好发生在它之后的内存为零的情况,因此printf知道何时停止。

+-----------------------+
|G|i|v|i|\0|\0|...      |
+-----------------------+
| your  | rest of       |
| stuff | memory (stack)|
+-----------------------+

Other languages, such as Java, have safeguards against this sort of situations. 其他语言(例如Java)可以防止此类情况发生。 Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. 但是,诸如C之类的语言手握较少,一方面可以提供更大的灵活性,但另一方面,它却为您提供了很多更多的方法来解决此类问题,例如脚步枪。 In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years . 换句话说, 如果您的代码可以编译,那并不意味着它是正确的,并且现在不会在5分钟或5年内崩溃

In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. 在现实生活中,这种情况几乎从来没有发生过,并且您的字符串可能最终会被存储在其他事物的旁边,而这些事物最终总是会与字符串一起被打印输出。 You never want this. 你永远不要这个。 Situations like this might lead to crashes, exploits and leaked confidential information. 这样的情况可能会导致崩溃,漏洞利用和泄露机密信息。

See the following diagram for an example. 有关示例,请参见下图。 Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string: 假设您正在使用Web服务器,并且字符串“ secret”(秘密)-用户的密码或密钥存储在无害字符串的旁边:

+-----------------------+
|G|i|v|i|s|e|c|r|e|t    |
+-----------------------+
| your  | rest of       |
| stuff | memory (stack)|
+-----------------------+

Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want. 每次输出您认为是“ Givi”的内容时,最终都会打印出秘密字符串,这不是您想要的。

What your book states is basically right, but there is missing the phrase "at least". 您的书上所说的基本上是正确的,但缺少“至少”一词。 The array can very well be larger. 该数组可能会更大。

You already stated the reason for the min length requirement. 您已经说明了最小长度要求的原因。 So what does that tell you about the example? 那么,该示例告诉您什么? It is crap ! 废话

What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. 它所显示的内容称为未定义行为 (UB),可能会导致守护程序为printf()而不是初始化程序飞出您的鼻子。 It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly. 它只是不包含在C标准中(嗯,该标准实际上说这是UB),因此预期编译器(和您的库)的行为不正确。

For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()". 在这种情况下,不会显式附加任何终结符,因此在传递给`printf()“时,该字符串不会正确终止。

Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. 不会产生错误的原因很可能是某些旧版代码确实利用它来保护某些字节的内存。 So, instead of reporting an error that the implicit trailing '\\0' terminator does not fit, it simply does not append it. 因此,它没有报告隐式尾随'\\0'终止符不适合的错误,而是根本没有附加该错误。 Silently truncating the string literal would also be a bad idea. 默默地截断字符串文字也是一个坏主意。

The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or char s) while they are not 0. 最后一个字符之后的字节必须始终为0,否则printf将不知道何时终止该字符串,并且将尝试访问不为0的字节(或char )。

As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string. 正如Andrei所说,显然这只是发生了,编译器在字符串数据后放置了至少一个字节,其值为0,因此printf识别出字符串的结尾。

This can vary from compiler to compiler and thus is undefined behaviour. 这可能因编译器而异,因此是未定义的行为。

There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. 例如,可能有机会让printf访问一个您的程序不允许访问的地址。 This would result in a crash. 这将导致崩溃。

In C text strings are stored as zero terminated arrays of characters. 在C语言中,字符串存储为零终止的字符数组。 This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string. 这意味着文本字符串的结尾由特殊字符(数值为零(0))表示,以指示字符串的结尾。

So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string. 因此,用于存储C文本字符串的文本字符数组必须包括每个字符的数组元素,以及字符串末尾的附加数组元素。

All of the C text string functions ( strcpy() , strcmp() , strcat() , etc.) all expect that the end of a text string is indicated by a value of zero. 所有C文本字符串函数( strcpy()strcmp()strcat()等)都期望文本字符串的结尾由零值指示。 This includes the printf() family of functions that print or output text to the screen or to a file. 这包括printf()系列函数,这些函数可将文本打印或输出到屏幕或文件。 Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. 由于这些函数依赖于看到零值来终止字符串,因此使用C文本字符串时,错误的一个根源是由于缺少零终止符而复制了太多字符,或者将长文本字符串复制到了较小的缓冲区中。 This type of error is known as a buffer overflow error. 这种类型的错误称为缓冲区溢出错误。

The C compiler will perform some types of adjustments for you automatically. C编译器将自动为您执行某些类型的调整。 For instance: 例如:

char *pText = "four";   // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four";   // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four";  // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth

In the example you provided a good C compiler should issue at the minimum a warning and probably an error. 在该示例中,您提供了一个好的C编译器,它至少应发出警告甚至可能是错误。 However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. 但是,看起来您的编译器正在将字符串截断为数组大小,并且没有添加其他零字符串终止符。 And you are getting lucky in that there is a zero value after the end of the string. 而且您很幸运,字符串的末尾有一个零值。 I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely. 我猜想C编译器也可能会添加一个额外的数组元素,但这似乎不太可能。

The following line: 下一行:

char name[4] = "Givi";

May give warning like: 可能会发出如下警告:

string for array of chars is too long

Because the behavior is Undefined , still compiler may pass it. 由于该行为是Undefined ,因此编译器仍可以通过它。 But if you debug, you will see: 但是,如果您进行调试,则会看到:

name[0]                   'G'
name[1]                   'i'
name[2]                   'V'
name[3]                   '\0'

And so the output is 所以输出是

Giv 吉夫

Not Give as you mentioned in the question! 不像您在问题中提到的那样给予

I'm using GCC compiler. 我正在使用GCC编译器。

But if you write something like this: 但是,如果您编写这样的内容:

char name[4] = "Giv";

Compiles fine! 编译很好! And output is 和输出是

Giv 吉夫

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM