简体   繁体   English

将字符串中的所有非ASCII字符替换为其等效的ASCII

[英]Replace all non-ASCII characters in a string by their ASCII equivalent

Using Qt/C++, I need to generate a string with only a subset of ASCII characters : letters, digits, hyphen, underscore, period, or colon. 使用Qt / C ++,我需要生成仅包含ASCII字符子集的字符串:字母,数字,连字符,下划线,句号或冒号。

As input, I can have anything. 作为输入,我可以有任何东西。

So I try to apply some rules : 因此,我尝试应用一些规则:

  • every QChar::isSpace will be replaced with an underscore 每个QChar :: isSpace都将由下划线替换
  • every non-ASCII letters will be replaced with an ASCII equivalent (example : "é" will be replaced with "e") 每个非ASCII字母都将替换为等效的ASCII字母(例如:“é”将替换为“ e”)
  • every other non-ASCII character will be removed 所有其他非ASCII字符将被删除

Is there any simple way with Qt/C++ to apply the 2nd and the 3rd rule ? Qt / C ++是否有任何简单的方法可以应用第二条规则和第三条规则?

Thanks 谢谢

Yes, there is a way. 是的,有办法。 At first you should do unicode normalization to your string with QString::normalized . 首先,您应该使用QString::normalized对字符串进行unicode QString::normalized Normalization is needed to separate diacritical signs from letters and to replace some fancy symbols with ascii equivalents. 需要规范化以将变音符号与字母分开,并用ascii等效项替换一些花哨的符号。 Here you can read about normalization forms. 在这里您可以阅读有关标准化表格的信息。

Then you may take chars which can be encoded in Latin-1. 然后,您可以获取可以用Latin-1编码的字符。 Can be tested with toLatin1 method of QChar. 可以使用QChar的toLatin1方法进行测试。

char QChar::toLatin1() const char QChar :: toLatin1()常量

Returns the Latin-1 character equivalent to the QChar, or 0. This is mainly useful for non-internationalized software. 返回等效于QChar或0的Latin-1字符。这主要用于非国际化的软件。

... ...

QString testString = QString::fromUtf8("Ceñía-üÏÖ马克ñ");
QString normalized = testString.normalized(QString::NormalizationForm_KD);
QString result;

copy_if(normalized.begin(), normalized.end(), back_inserter(result), [](QChar& c) {
    return c.toLatin1() != 0;
});

qDebug() << result; // Cenia-uIOn

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM