为什么 boost::spirit::unicode::char_ 不再适用于 UTF-8 char* 字符串？

Question

使用 boost 1.60 版，我可以使用#define BOOST_SPIRIT_UNICODE和boost::spirit::unicode::char_来处理 UTF-8 输入字符串，而无需任何进一步的预处理。 对于 boost 版本 1.72，这会失败并出现异常。

解决方案似乎是使用boost::u8_to_u32_iterator并让 Spirit 使用宽字符串。 但是为什么它在早期版本中如此完美地工作，如果可能的话，我怎样才能重新激活旧的行为？

这是一些示例代码：

#define BOOST_SPIRIT_UNICODE
#include <boost/spirit/include/qi.hpp>

int main()
{
   typedef std::string::const_iterator iterator_type;
   namespace qi = boost::spirit::qi;
   namespace unicode = boost::spirit::unicode;

   std::string input("\"Test ⏳\"");
   qi::rule<iterator_type, std::string(), unicode::space_type> quoted_string = qi::lexeme['"' >> +(unicode::char_ - '"') >> '"'];

   iterator_type iter = input.begin();
   iterator_type end = input.end();
   std::string output;
   bool r = phrase_parse(iter, end, quoted_string, unicode::space, output);

   if (r && iter == end)
      std::cout << "successfully parsed " << input << " to " << output << std::endl;
   else
      std::cout << "failed to parse " << input << std::endl;

   return 0;
}

Answer 1

在我的本地机器上运行 Boost 1.65.1 成功解析并且没有明显的 ASAN/UBSAN 跳闸。

我将 Spirit 的 Git repo 中的提交一分为二，并在 1.72.0 (SPIRIT_VERSION 0x2058) 的标签处发现了第一个破损。

我发现破坏它的提交是

commit 16159fb335c9bb2040cf061e30fdd4deea9087e1 (HEAD)
Author: djowel <djowel@gmail.com>
Date:   Mon Aug 26 10:15:05 2019 +0800

    add invalid ascii tests + fix

这似乎（无意中）回归了这一点，因为它实际上不是 ASCII。 我会在 Boost Spirit Repo 上提交有关此分析的错误。

如果它有任何用处，只需使用 Boost 1.76.0 但恢复 16159fb335c9 即可。

为什么 boost::spirit::unicode::char_ 不再适用于 UTF-8 char* 字符串？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-18 18:58:33

为什么 boost::spirit::unicode::char_ 不再适用于 UTF-8 char* 字符串？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-18 18:58:33

解决方案1
1 已采纳 2021-05-18 18:58:33