简体   繁体   中英

Why does boost::spirit::unicode::char_ no longer work with UTF-8 char* strings?

With boost version 1.60 I could use #define BOOST_SPIRIT_UNICODE and boost::spirit::unicode::char_ to process UTF-8 input strings without any further preprocessing. With boost version 1.72 this fails with an exception.

The solution seems to be to use boost::u8_to_u32_iterator and let spirit work with wide strings. But why did it work so flawlessly in the earlier version and if possible how can I reactivate the old behavior?

Here is some sample code:

#define BOOST_SPIRIT_UNICODE
#include <boost/spirit/include/qi.hpp>

int main()
{
   typedef std::string::const_iterator iterator_type;
   namespace qi = boost::spirit::qi;
   namespace unicode = boost::spirit::unicode;

   std::string input("\"Test ⏳\"");
   qi::rule<iterator_type, std::string(), unicode::space_type> quoted_string = qi::lexeme['"' >> +(unicode::char_ - '"') >> '"'];

   iterator_type iter = input.begin();
   iterator_type end = input.end();
   std::string output;
   bool r = phrase_parse(iter, end, quoted_string, unicode::space, output);

   if (r && iter == end)
      std::cout << "successfully parsed " << input << " to " << output << std::endl;
   else
      std::cout << "failed to parse " << input << std::endl;

   return 0;
}

Running on my local box with Boost 1.65.1 parses successfully AND without apparent ASAN/UBSAN trippings.

I bisected the commits in the Git repo foor Spirit and found first breakage at tag for 1.72.0 (SPIRIT_VERSION 0x2058).

I found the commit that breaks it was

commit 16159fb335c9bb2040cf061e30fdd4deea9087e1 (HEAD)
Author: djowel <djowel@gmail.com>
Date:   Mon Aug 26 10:15:05 2019 +0800

    add invalid ascii tests + fix

That seems to have (unintentionally) regressed this because it wasn't in fact ASCII. I would file a bug with this analysis at the Boost Spirit Repo.

If if it of any use to use, just using Boost 1.76.0 but with 16159fb335c9 reverted works fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM