简体   繁体   中英

How can I unescape a UTF-8 string in c++

It's a little different from many cases I have searched.

I receive a string such as the following :

std::string str = "\\u8f93\\u5165\\u7684";

How can I parse the escape sequences to construct an actual UTF-8 string ?

It's a simple parse-and-convert job, for example could be done this way:

#include <iostream>
#include <string>
#include <codecvt>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
    std::string str = "\\u8f93\\u5165\\u7684";
    std::u16string u16;
    qi::parse(str.begin(), str.end(), *("\\u" >> qi::hex), u16);
    std::string u8 = std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t>().to_bytes(u16);
    std::cout << "utf 8 string " << u8 << " consiting of " << u8.size() << " bytes\n";
}

Live at coliru http://coliru.stacked-crooked.com/a/62efb680a3d27a60

Note: this answer was posted before a clarifying edit was added to the question.


Just pass it to string like "\输\入\的"

Or:

#include <codecvt>
std::string(u8"\u8f93\u5165\u7684")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM