简体   繁体   English

C ++下/上字符char指针

[英]C++ tolower/toupper char pointer

Do you guys know why the following code crash during the runtime? 你们知道为什么以下代码在运行时崩溃吗?

char* word;
word = new char[20];
word = "HeLlo"; 
for (auto it = word; it != NULL; it++){        
    *it = (char) tolower(*it);

I'm trying to lowercase a char* (string). 我正在尝试将char *(字符串)小写。 I'm using visual studio. 我正在使用Visual Studio。

Thanks 谢谢

You cannot compare it to NULL . 您无法将itNULL进行比较。 Instead you should be comparing *it to '\\0' . 相反,您应该将*it'\\0'进行比较。 Or better yet, use std::string and never worry about it :-) 或者更好的是,使用std::string ,不必担心它:-)

In summary, when looping over a C-style string. 总之,在遍历C样式的字符串时。 You should be looping until the character you see is a '\\0' . 您应该循环播放,直到看到的字符'\\0'为止。 The iterator itself will never be NULL , since it is simply pointing a place in the string. 迭代器本身永远不会为NULL ,因为它只是指向字符串中的一个位置。 The fact that the iterator has a type which can be compared to NULL is an implementation detail that you shouldn't touch directly. 迭代器具有可以与NULL比较的类型这一事实是您不应该直接接触的实现细节。

Additionally, you are trying to write to a string literal. 此外,您正在尝试写入字符串文字。 Which is a no-no :-). 这是一个禁忌:-)。

EDIT : As noted by @Cheers and hth. 编辑 :@Cheers和hth指出。 - Alf, tolower can break if given negative values. -Alf,如果给定负值, tolower可能会破裂。 So sadly, we need to add a cast to make sure this won't break if you feed it Latin-1 encoded data or similar. 很遗憾,我们需要添加一个强制转换,以确保在喂入Latin-1编码的数据或类似数据时,此转换不会中断。

This should work: 这应该工作:

char word[] = "HeLlo";
for (auto it = word; *it != '\0'; ++it) {
    *it = tolower(static_cast<unsigned char>(*it));
}

You're setting word to point to the string literal, but literals are read-only, so this results in undefined behavior when you assign to *it . 您将word设置为指向字符串文字,但是文字是只读的,因此当您分配给*it时,这将导致未定义的行为。 You need to make a copy of it in the dynamically-allocated memory. 您需要在动态分配的内存中制作一个副本。

char *word = new char[20];
strcpy(word, "HeLlo");

Also in your loop you should compare *it != '\\0' . 同样在循环中,您应该比较*it != '\\0' The end of a string is indicated by the character being the null byte, not the pointer being null. 字符串的结尾由字符表示为空字节,而不是指针为空。

Given code (as I'm writing this): 给定代码(在写这篇文章时):

char* word;
word = new char[20];
word = "HeLlo"; 
for (auto it = word; it != NULL; it++){        
    *it = (char) tolower(*it);

This code has Undefined Behavior in 2 distinct ways, and would have UB also in a third way if only the text data was slightly different: 该代码以两种不同的方式具有未定义的行为 ,如果仅文本数据稍有不同,则该代码也将以第三种方式具有UB:

  • Buffer overrun. 缓冲区溢出。
    The continuation condition it != NULL will not be false until the pointer it has wrapped around at the end of the address range, if it does. 连续条件it != NULL直到it在地址范围的末尾环绕指针(如果确实如此)时,才会为false

  • Modifying read only memory. 修改只读存储器。
    The pointer word is set to point to the first char of a string literal, and then the loop iterates over that string and assigns to each char . 指针word设置为指向所述第一char的字符串的文字,然后用该字符串的循环迭代和分配给每个char

  • Passing possible negative value to tolower . 将可能的负值传递到tolower
    The char classification functions require a non-negative argument, or else the special value EOF . char分类函数需要一个非负参数,或者特殊值EOF This works fine with the string "HeLlo" under an assumption of ASCII or unsigned char type. 在ASCII或无符号char类型的假设下,此字符串可以很好地与字符串"HeLlo"使用。 But in general, eg with the string "Blåbærsyltetøy" , directly passing each char value to tolower will result in negative values being passed; 但是通常,例如,使用字符串"Blåbærsyltetøy" ,将每个char值直接传递给tolower都会导致传递负值; a correct invocation with ch of type char is (char) tolower( (unsigned char)ch ) . chchar类型正确的调用是(char) tolower( (unsigned char)ch )

Additionally the code has a memory leak , by allocating some memory with new and then just forgetting about it. 此外,代码还会发生内存泄漏 ,方法是使用new分配一些内存,然后忘记它。

A correct way to code the apparent intent: 正确编码表面意图的正确方法:

using Byte = unsigned char;

auto to_lower( char const c )
    -> char
{ return Byte( tolower( Byte( c ) ) ); }

// ...
string word = "Hello";
for( char& ch : word ) { ch = to_lower( ch ); }

There are already two nice answers on how to solve your issues using null terminated c-strings and poitners. 关于如何使用以空终止的c字符串和poitners解决问题的方法,已经有了两个不错的答案。 For the sake of completeness, I propose you an approach using c++ strings: 为了完整起见,我建议您使用c ++字符串的方法:

string word;           // instead of char* 
//word = new char[20]; // no longuer needed: strings take care for themseves
word = "HeLlo";        //  no worry about deallocating previous values: strings take care for themselves
for (auto &it : word)  // use of range for, to iterate through all the string elements      
    it = (char) tolower(it);

它崩溃是因为您正在修改字符串文字。

there is a dedicated functions for this use strupr for making string uppercase and strlwr for making the string lower case. 有一个专用的函数供使用, strupr使字符串大写, strlwr使字符串小写。

here is an usage example: 这是一个用法示例:

char str[ ] = "make me upper";
printf("%s\n",strupr(str));


char str[ ] = "make me lower";
printf("%s\n",strlwr (str));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM