I have a string output that ins not necessarily valid utf8. I have to pass it to a method only accepting valid utf8 strings.
Therefore I need to convert output to the closest valid utf8 string removing invalid bytes or parts. How can I do that in c++? I would like not to use a 3rd party library.
您应该使用fromUTF8(const StringPiece &utf8)
或toUTF8String(StringClass &result).
的icu::UnicodeString
方法toUTF8String(StringClass &result).
If you're sure your string is valid UTF-8 with only a few corrupt bytes, http://utfcpp.sourceforge.net/ can fix that. From the page:
#include "utf8.h"
void fix_utf8_string(std::string& str) {
std::string temp;
utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
str = temp;
}
Your requirement for not using a 3rd party library is pretty much impossible when dealing with Unicode data, but the UTF8-CPP library is header-only which is as light as you can get.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.