简体   繁体   中英

How convert to utf8 string in c++

I have a string output that ins not necessarily valid utf8. I have to pass it to a method only accepting valid utf8 strings.
Therefore I need to convert output to the closest valid utf8 string removing invalid bytes or parts. How can I do that in c++? I would like not to use a 3rd party library.

您应该使用fromUTF8(const StringPiece &utf8)toUTF8String(StringClass &result).icu::UnicodeString方法toUTF8String(StringClass &result).

If you're sure your string is valid UTF-8 with only a few corrupt bytes, http://utfcpp.sourceforge.net/ can fix that. From the page:

#include "utf8.h"
void fix_utf8_string(std::string& str) {
    std::string temp;
    utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
    str = temp;
}

Your requirement for not using a 3rd party library is pretty much impossible when dealing with Unicode data, but the UTF8-CPP library is header-only which is as light as you can get.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM