将std :: string中的迭代字符与unicode C ++进行比较

Question

我已经在这个问题上挣扎了很长一段时间，这是我第一次基本上处理unicode或UTF-8。

这就是我想要做的，我只想迭代一个std :: string，其中包含来自普通字母和unicode符号的组合，在我的例子中是短划线“ - ”。 更多信息： http ： //www.fileformat.info/info/unicode/char/2013/index.htm

这是我尝试过的代码，它不会运行：

#include <iostream>
#include <string>

int main()
{
    std::string str = "test string with symbol – and !";
    for (auto &letter : str) {
        if (letter == "–") {
            std::cout << "found!" << std::endl;
        }
    }
    return 0;
}

这是我的编译器的结果：

main.cpp: In function 'int main()':
main.cpp:18:23: error: ISO C++ forbids comparison between pointer and 
integer [-fpermissive]
     if (letter == "–") {
                   ^

此外，当我通过互联网浏览时，我发现了一个有趣的信息，我需要解决这类任务。 如何在c ++字符串中搜索非ASCII字符？

但是当我试图用那些UTF-8十六进制代码修改我的代码时，它也不会运行：

    if (letter == "\xE2\x80\x93") {
        std::cout << "found!" << std::endl;
    }

与我的编译器完全相同的消息，这是c ++禁止指针和整数之间的比较。

我错过了什么？ 或者我是否需要使用ICU或Boost等库？ 非常感谢您的帮助。 谢谢！

更新

基于UnholySheep的答案，我一直在改进我的代码，但它仍然无法工作。 它可以通过编译，但当我试图运行它，它不能输出“发现！” 出去 那么，我该如何解决这个问题呢？ 谢谢。

Answer 1

这段代码怎么样？

#include <iostream>
#include <string>

int main()
{
    std::wstring str = L"test string with symbol – and !";
    for (auto &letter : str) {
        if (letter == L'–') {
            std::cout << "found!" << std::endl;
        }
    }
    return 0;
}

Answer 2

正如UnholySheep在评论中所说，char字面"–"是一个char数组。 假设有一个utf8表示， char em_dash = "–"; 与char em_dash = {'\\xe2', '\\x80', '\\x93'}; 。

您只能使用当前代码找到真实字符。 例如，这将正常工作：

...
if (letter == '!')
...

因为'!' 是一个char常数。

如果你只想处理基本多语言平面中的unicode字符（代码低于0xFFFF），那么使用宽字符就足够了@ ArashMohammadi的答案。 对于BMP之外的字符（如表情符号字符）的替代解决方案是使用std::u32string ，其中每个unicode字符由单个char32_t字符表示。

如果要直接处理UTF8编码的单字节字符串，则必须使用compare方法：

std::string em_dash = "–"; // or "\xe2\x80\x93"
...
    for (size_t pos=0; pos <= str.size() - em_dash.size(); pos++) {
        if (str.compare(pos, em_dash.size(), em_dash()) == 0) {
            std::cout << "found!" << std::endl;
        }
    }
...

或者直接使用find方法：

...
    if (str.find(em_dash) != str.npos) {
        std::cout << "found!" << std::endl;
    }
...

将std :: string中的迭代字符与unicode C ++进行比较

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-11-17 08:48:11

解决方案2
1 2017-11-17 09:06:29

将std :: string中的迭代字符与unicode C ++进行比较

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-11-17 08:48:11

解决方案2 1 2017-11-17 09:06:29

解决方案1
2 已采纳 2017-11-17 08:48:11

解决方案2
1 2017-11-17 09:06:29