[英]C++ tolower/toupper char pointer
Do you guys know why the following code crash during the runtime? 你们知道为什么以下代码在运行时崩溃吗?
char* word;
word = new char[20];
word = "HeLlo";
for (auto it = word; it != NULL; it++){
*it = (char) tolower(*it);
I'm trying to lowercase a char* (string). 我正在尝试将char *(字符串)小写。 I'm using visual studio. 我正在使用Visual Studio。
Thanks 谢谢
You cannot compare it
to NULL
. 您无法将it
与NULL
进行比较。 Instead you should be comparing *it
to '\\0'
. 相反,您应该将*it
与'\\0'
进行比较。 Or better yet, use std::string
and never worry about it :-) 或者更好的是,使用std::string
,不必担心它:-)
In summary, when looping over a C-style string. 总之,在遍历C样式的字符串时。 You should be looping until the character you see is a '\\0'
. 您应该循环播放,直到看到的字符为'\\0'
为止。 The iterator itself will never be NULL
, since it is simply pointing a place in the string. 迭代器本身永远不会为NULL
,因为它只是指向字符串中的一个位置。 The fact that the iterator has a type which can be compared to NULL
is an implementation detail that you shouldn't touch directly. 迭代器具有可以与NULL
比较的类型这一事实是您不应该直接接触的实现细节。
Additionally, you are trying to write to a string literal. 此外,您正在尝试写入字符串文字。 Which is a no-no :-). 这是一个禁忌:-)。
EDIT : As noted by @Cheers and hth. 编辑 :@Cheers和hth指出。 - Alf, tolower
can break if given negative values. -Alf,如果给定负值, tolower
可能会破裂。 So sadly, we need to add a cast to make sure this won't break if you feed it Latin-1 encoded data or similar. 很遗憾,我们需要添加一个强制转换,以确保在喂入Latin-1编码的数据或类似数据时,此转换不会中断。
This should work: 这应该工作:
char word[] = "HeLlo";
for (auto it = word; *it != '\0'; ++it) {
*it = tolower(static_cast<unsigned char>(*it));
}
You're setting word
to point to the string literal, but literals are read-only, so this results in undefined behavior when you assign to *it
. 您将word
设置为指向字符串文字,但是文字是只读的,因此当您分配给*it
时,这将导致未定义的行为。 You need to make a copy of it in the dynamically-allocated memory. 您需要在动态分配的内存中制作一个副本。
char *word = new char[20];
strcpy(word, "HeLlo");
Also in your loop you should compare *it != '\\0'
. 同样在循环中,您应该比较*it != '\\0'
。 The end of a string is indicated by the character being the null byte, not the pointer being null. 字符串的结尾由字符表示为空字节,而不是指针为空。
Given code (as I'm writing this): 给定代码(在写这篇文章时):
char* word;
word = new char[20];
word = "HeLlo";
for (auto it = word; it != NULL; it++){
*it = (char) tolower(*it);
This code has Undefined Behavior in 2 distinct ways, and would have UB also in a third way if only the text data was slightly different: 该代码以两种不同的方式具有未定义的行为 ,如果仅文本数据稍有不同,则该代码也将以第三种方式具有UB:
Buffer overrun. 缓冲区溢出。
The continuation condition it != NULL
will not be false
until the pointer it
has wrapped around at the end of the address range, if it does. 连续条件it != NULL
直到it
在地址范围的末尾环绕指针(如果确实如此)时,才会为false
。
Modifying read only memory. 修改只读存储器。
The pointer word
is set to point to the first char
of a string literal, and then the loop iterates over that string and assigns to each char
. 指针word
设置为指向所述第一char
的字符串的文字,然后用该字符串的循环迭代和分配给每个char
。
Passing possible negative value to tolower
. 将可能的负值传递到tolower
。
The char
classification functions require a non-negative argument, or else the special value EOF
. char
分类函数需要一个非负参数,或者特殊值EOF
。 This works fine with the string "HeLlo"
under an assumption of ASCII or unsigned char
type. 在ASCII或无符号char
类型的假设下,此字符串可以很好地与字符串"HeLlo"
使用。 But in general, eg with the string "Blåbærsyltetøy"
, directly passing each char
value to tolower
will result in negative values being passed; 但是通常,例如,使用字符串"Blåbærsyltetøy"
,将每个char
值直接传递给tolower
都会导致传递负值; a correct invocation with ch
of type char
is (char) tolower( (unsigned char)ch )
. ch
的char
类型正确的调用是(char) tolower( (unsigned char)ch )
。
Additionally the code has a memory leak , by allocating some memory with new
and then just forgetting about it. 此外,代码还会发生内存泄漏 ,方法是使用new
分配一些内存,然后忘记它。
A correct way to code the apparent intent: 正确编码表面意图的正确方法:
using Byte = unsigned char;
auto to_lower( char const c )
-> char
{ return Byte( tolower( Byte( c ) ) ); }
// ...
string word = "Hello";
for( char& ch : word ) { ch = to_lower( ch ); }
There are already two nice answers on how to solve your issues using null terminated c-strings and poitners. 关于如何使用以空终止的c字符串和poitners解决问题的方法,已经有了两个不错的答案。 For the sake of completeness, I propose you an approach using c++ strings: 为了完整起见,我建议您使用c ++字符串的方法:
string word; // instead of char*
//word = new char[20]; // no longuer needed: strings take care for themseves
word = "HeLlo"; // no worry about deallocating previous values: strings take care for themselves
for (auto &it : word) // use of range for, to iterate through all the string elements
it = (char) tolower(it);
它崩溃是因为您正在修改字符串文字。
there is a dedicated functions for this use strupr
for making string uppercase and strlwr
for making the string lower case. 有一个专用的函数供使用, strupr
使字符串大写, strlwr
使字符串小写。
here is an usage example: 这是一个用法示例:
char str[ ] = "make me upper";
printf("%s\n",strupr(str));
char str[ ] = "make me lower";
printf("%s\n",strlwr (str));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.