[英]C++ Strip non-ASCII Characters from string
Before you get started;在你开始之前; yes I know this is a duplicate question and yes I have looked at the posted solutions.是的,我知道这是一个重复的问题,是的,我已经查看了发布的解决方案。 My problem is I could not get them to work.我的问题是我无法让他们工作。
bool invalidChar (char c)
{
return !isprint((unsigned)c);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
I tested this method on "Prusæus, Ægyptians," and it did nothing I also attempted to substitute isprint
for isalnum
我在“Prusæus, Ægyptians”上测试了这个方法,但它什么也没做我还试图用isprint
代替isalnum
The real problem occurs when, in another section of my program I convert string->wstring->string.当我在程序的另一部分转换 string->wstring->string 时,就会出现真正的问题。 the conversion balks if there are unicode chars in the string->wstring conversion.如果 string->wstring 转换中有 unicode 字符,则转换会停止。
Ref:参考:
How can you strip non-ASCII characters from a string? 如何从字符串中去除非 ASCII 字符? (in C#) (在 C# 中)
How to strip all non alphanumeric characters from a string in c++? 如何从 C++ 中的字符串中去除所有非字母数字字符?
Edit:编辑:
I still would like to remove all non-ASCII chars regardless yet if it helps, here is where I am crashing:我仍然想删除所有非 ASCII 字符,不管它是否有帮助,这就是我崩溃的地方:
// Convert to wstring
wchar_t* UnicodeTextBuffer = new wchar_t[ANSIWord.length()+1];
wmemset(UnicodeTextBuffer, 0, ANSIWord.length()+1);
mbstowcs(UnicodeTextBuffer, ANSIWord.c_str(), ANSIWord.length());
wWord = UnicodeTextBuffer; //CRASH
Error Dialog错误对话框
MSVC++ Debug Library MSVC++ 调试库
Debug Assertion Failed!调试断言失败!
Program: //myproject程序://我的项目
File: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c文件:f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c
Line: //Above行://以上
Expression:(unsigned)(c+1)<=256表达式:(无符号)(c+1)<=256
Edit:编辑:
Further compounding the matter: the .txt file I am reading in from is ANSI encoded.更复杂的是:我从中读取的 .txt 文件是 ANSI 编码的。 Everything within should be valid.里面的一切都应该是有效的。
Solution:解决方案:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
If someone else would like to copy/paste this, I can check this question off.如果其他人想复制/粘贴这个,我可以勾选这个问题。
EDIT:编辑:
For future reference: try using the __isascii, iswascii commands供将来参考:尝试使用__isascii、iswascii命令
Solution:解决方案:
bool invalidChar (char c)
{
return !(c>=0 && c <128);
}
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), invalidChar), str.end());
}
EDIT:编辑:
For future reference: try using the __isascii, iswascii commands供将来参考:尝试使用 __isascii、iswascii 命令
At least one problem is in your invalidChar
function.至少有一个问题出在您的invalidChar
函数中。 It should be:它应该是:
return !isprint( static_cast<unsigned char>( c ) );
Casting a char
to an unsigned
is likely to give some very, very big values if the char
is negative ( UNIT_MAX+1 + c). Passing such a value to
如果char
为负数 ( UNIT_MAX+1 + c). Passing such a value to
,则将char
转换为unsigned
可能会给出一些非常非常大的值。 UNIT_MAX+1 + c). Passing such a value to
isprint` is undefined behavior. UNIT_MAX+1 + c). Passing such a value to
isprint` 是未定义的行为。
Another solution that doesn't require defining two functions but uses anonymous functions available in C++17 above:另一个不需要定义两个函数但使用上面 C++17 中可用的匿名函数的解决方案:
void stripUnicode(string & str)
{
str.erase(remove_if(str.begin(),str.end(), [](char c){return !(c>=0 && c <128);}), str.end());
}
I think it looks cleaner我觉得它看起来更干净
isprint
depends on the locale, so the character in question must be printable in the current locale. isprint
取决于语言环境,因此相关字符必须在当前语言环境中可打印。
If you want strictly ASCII, check the range for [0..127].如果您想要严格的 ASCII,请检查 [0..127] 的范围。 If you want printable ASCII, check the range and isprint
.如果您想要可打印的 ASCII,请检查范围和isprint
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.