简体   繁体   English

如何制作一个非空终止的c字符串?

[英]how to make a not null-terminated c string?

i am wondering :char *cs = .....;what will happen to strlen() and printf("%s",cs) if cs point to memory block which is huge but with no '\\0' in it? 我想知道:char * cs = .....;如果cs指向内存块但是没有'\\ 0',那么strlen()和printf(“%s”,cs)会发生什么? i write these lines: 我写这些线:

 char s2[3] = {'a','a','a'};
printf("str is %s,length is %d",s2,strlen(s2));

i get the result :"aaa","3",but i think this result is because that a '\\0'(or a 0 byte) happens to reside in the location s2+3. 我得到结果:“aaa”,“3”,但我认为这个结果是因为'\\ 0'(或0字节)恰好位于s2 + 3位置。 how to make a not null-terminated c string? 如何制作一个非空终止的c字符串? strlen and other c string function relies heavily on the '\\0' byte,what if there is no '\\0',i just want know this rule deeper and better. strlen和其他c字符串函数严重依赖于'\\ 0'字节,如果没有'\\ 0',我只想更深入,更好地了解这条规则。

ps: my curiosity is aroused by studying the follw post on SO. ps:通过研究SO上的帖子来激发我的好奇心。 How to convert a const char * to std::string and these word in that post : "This is actually trickier than it looks, because you can't call strlen unless the string is actually nul terminated." 如何将const char *转换为std :: string以及该帖子中的这些单词:“这实际上比它看起来更棘手,因为你不能调用strlen,除非字符串实际上是nul终止的。”

If it's not null-terminated, then it's not a C string, and you can't use functions like strlen - they will march off the end of the array, causing undefined behaviour. 如果它不是以null结尾,那么它不是C字符串,你不能使用像strlen这样的函数 - 它们将从数组的末尾开始,导致未定义的行为。 You'll need to keep track of the length some other way. 你需要以其他方式跟踪长度。

You can still print a non-terminated character array with printf , as long as you give the length: 你仍然可以用printf打印一个没有终止的字符数组,只要你给出长度:

printf("str is %.3s",s2);
printf("str is %.*s",s2_length,s2);

or, if you have access to the array itself, not a pointer: 或者,如果您有权访问数组本身,而不是指针:

printf("str is %.*s", (int)(sizeof s2), s2);

You've also tagged the question C++: in that language, you usually want to avoid all this error-prone malarkey and use std::string instead. 您还标记了C ++的问题:在该语言中,您通常希望避免所有这些容易出错的malarkey并使用std::string

A "C string" is, by definition, null-terminated. 根据定义,“C字符串”以空值终止。 The name comes from the C convention of having null-terminated strings. 该名称来自具有以null结尾的字符串的C约定。 If you want something else, it's not a C string. 如果你想要别的东西,它不是C字符串。

So if you have a string that is not null-terminated, you cannot use the C string manipulation routines on it. 因此,如果您有一个非空终止的字符串,则不能在其上使用C字符串操作例程。 You can't use strlen , strcpy or strcat . 你不能使用strlenstrcpystrcat Basically, any function that takes a char* but no separate length is not usable. 基本上,任何采用char*但没有单独长度的函数都是不可用的。

Then what can you do? 那你能做什么? If you have a string that is not null-terminated, you will have the length separately. 如果您有一个非空终止的字符串,则您将分别拥有该长度。 (If you don't, you're screwed. You need some way to find the length, either by a terminator or by storing it separately.) What you can do is allocate a buffer of the appropriate size, copy the string over, and append a null. (如果你没有,你就搞砸了。你需要一些方法来找到长度,无论是通过终结器还是单独存储它。)你可以做的是分配一个适当大小的缓冲区,复制字符串,并附加一个null。 Or you can write your own set of string manipulation functions that work with pointer and length. 或者,您可以编写自己的一组字符串操作函数,这些函数可以使用指针和长度。 In C++ you can use std::string 's constructor that takes a char* and a length; 在C ++中,你可以使用std::string的构造函数,它接受一个char*和一个长度; that one doesn't need the terminator. 那个人不需要终结者。

What happens is that strlen keeps going, reading memory values until it eventually gets to a null. 会发生什么是strlen继续前进,读取内存值,直到它最终变为null。 it then assumes that is the terminator and returns the length that could be massively large. 然后假设它是终结符并返回可能非常大的长度。 If you're using strlen in an environment that expects C-strings to be used, you could then copy this huge buffer of data into another one that is just not big enough - causing buffer overrun problems, or at best, you could copy a large amount of garbage data into your buffer. 如果你在一个需要使用C字符串的环境中使用strlen,那么你可以将这个巨大的数据缓冲区复制到另一个不够大的数据库中 - 导致缓冲区溢出问题,或者最多你可以复制一个大量垃圾数据进入缓冲区。

Copying a non-null terminated C string into a std:string will do this. 将非空终止的C字符串复制到std:字符串将执行此操作。 If you then decide that you know this string is only 3 characters long and discard the rest, you will still have a massively long std:string that contains the first 3 good characters and then a load of wastage. 如果你确定你知道这个字符串只有3个字符长并丢弃其余的字符串,那么你仍然会有一个大的长std:字符串,其中包含前3个好字符,然后是浪费。 That's inefficient. 那效率很低。

The moral is, if you're using the CRT functions to operator on C strings, they must be null-terminated. 道德是,如果你使用CRT函数来操作C字符串,它们必须是空终止的。 Its no different to any other API, you must follow the rules that API sets down for correct usage. 它与任何其他API没有什么不同,您必须遵循API为正确使用而设置的规则。

Of course, there is no reason you cannot use the CRT functions if you always use the specific-length versions (eg strncpy) but you will have to limit yourself to just those, always , and manually keep track of the correct lengths. 当然,没有任何理由,你不能使用CRT的功能,如果你总是使用特定长度的版本(如strncpy()函数),但你必须限制自己只是那些, 总是和手动跟踪的正确长度。

Your supposition is correct: your strlen is returning the correct value out of sheer luck , because there happens to be a zero on the stack right after your improperly terminated string. 你的假设是正确的:你的strlen正在从纯粹的运气中返回正确的值,因为在你的不正确终止的字符串之后恰好在堆栈上有一个零。 It probably helps that the string is 3 bytes, and the compiler is likely aligning stuff on the stack to 4-byte boundaries. 它可能有助于字符串为3个字节,并且编译器可能将堆栈上的内容与4字节边界对齐。

You cannot depend on this. 你不能依赖于此。 C strings need NUL characters (zeroes) at the end to work correctly. C字符串最后需要NUL字符(零)才能正常工作。 C string handling is messy, and error-prone; C字符串处理很乱,容易出错; there are libraries and APIs that help make it less so… but it's still easy to screw up. 有些库和API可以帮助减少它......但它仍然很容易搞砸。 :) :)

In this particular case, your string could be initialized as one of these: 在这种特殊情况下,您的字符串可以初始化为以下之一:

  • A : char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long
  • B : char *s2 = "aaa"; // if you don't need to modify the string after creation Bchar *s2 = "aaa"; // if you don't need to modify the string after creation char *s2 = "aaa"; // if you don't need to modify the string after creation
  • C : char s2[]="aaa"; // if you DO need to modify the string afterwards Cchar s2[]="aaa"; // if you DO need to modify the string afterwards char s2[]="aaa"; // if you DO need to modify the string afterwards

Also note that declarations B and C are 'safer' in the sense that if someone comes along later and changes the string declaration in a way that alters the length, B and C are still correct automatically, whereas A depends on the programmer remembering to change the array size and keeping the explicit null terminator at the end. 另请注意,声明BC在某种意义上是“更安全”,即如果某人稍后出现并以改变长度的方式更改字符串声明,则BC仍然自动正确,而A取决于程序员记得要更改数组大小并在末尾保留显式空终止符。

Convention states that a char array with a terminating \\0 is a null terminated string. 约定规定具有终止\\0的char数组是空终止字符串。 This means that all str*() functions expect to find a null-terminator at the end of the char-array. 这意味着所有str*()函数都希望在char-array的末尾找到一个null-terminator。 But that's it, it's convention only. 但就是这样,它只是惯例。

By convention also strings should contain printable characters. 按照惯例,字符串也应包含可打印字符。

If you create an array like you did char arr[3] = {'a', 'a', 'a'}; 如果你像你一样创建一个数组char arr[3] = {'a', 'a', 'a'}; you have created a char array. 你已经创建了一个char数组。 Since it is not terminated by a \\0 it is not called a string in C, although its contents can be printed to stdout. 因为它没有被\\0终止,所以它在C中不被称为字符串,尽管它的内容可以打印到stdout。

What you have done is undefined behavior. 你所做的是未定义的行为。

You are trying to write to a memory location that is not yours. 您正在尝试写入不属于您的内存位置。

Change it to 将其更改为

char s2[] = {'a','a','a','\0'};

The C standard does not define the term string until the section 7 - Library functions . 在第7- 库函数之前,C标准没有定义术语字符串 The definition in C11 7.1.1p1 reads: C11 7.1.1p1中的定义如下:

  1. A string is a contiguous sequence of characters terminated by and including the first null character. 字符串是由第一个空字符终止并包括第一个空字符的连续字符序列

(emphasis mine) (强调我的)

If the definition of string is a sequence of characters terminated by a null character, a sequence of non-null characters not terminated by a null is not a string, period. 如果string的定义是由空字符终止的字符序列,则不以null结尾的非空字符序列不是字符串period。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM