简体   繁体   English

如何将 std::string 转换为 UTF-8?

[英]How can I convert a std::string to UTF-8?

I need to put a stringstream as a value of a JSON (using rapidjson library), but std::stringstream::str is not working because it is not returning UTF-8 characters.我需要将字符串流作为 JSON 的值(使用 rapidjson 库),但 std::stringstream::str 不起作用,因为它没有返回 UTF-8 字符。 How can I do that?我怎样才能做到这一点?

Example: d["key"].SetString(tmp_stream.str());示例: d["key"].SetString(tmp_stream.str());

rapidjson::Value::SetString accepts a pointer and a length. rapidjson::Value::SetString接受一个指针和一个长度。 So you have to call it this way:所以你必须这样称呼它:

std::string stream_data = tmp_stream.str();
d["key"].SetString(tmp_stream.data(), tmp_string.size());

As others have mentioned in the comments, std::string is a container of char values with no encoding specified.正如其他人在评论中提到的那样, std::string是一个没有指定编码的char值容器。 It can contain UTF-8 encoded bytes or any other encoding.它可以包含 UTF-8 编码字节或任何其他编码。

I tested putting invalid UTF-8 data in an std::string and calling SetString .我测试了将无效的 UTF-8 数据放入std::string并调用SetString RapidJSON accepted the data and simply replaced the invalid characters with "?". RapidJSON 接受数据并简单地将无效字符替换为“?”。 If that's what you're seeing, then you need to:如果这就是您所看到的,那么您需要:

  1. Determine what encoding your string has确定你的字符串有什么编码
  2. Re-encode the string as UTF-8将字符串重新编码为 UTF-8

If your string is ASCII, then SetString will work fine as ASCII and UTF-8 are compatible.如果您的字符串是 ASCII,那么SetString可以正常工作,因为 ASCII 和 UTF-8 是兼容的。

If your string is UTF-16 or UTF-32 encoded, there are several lightweight portable libraries to do this like utfcpp .如果您的字符串是 UTF-16 或 UTF-32 编码的,那么有几个轻量级的可移植库可以做到这一点,比如utfcpp C++11 had an API for this, but it was poorly supported and now deprecated as of C++17. C++11 为此一个 API,但它的支持很差,现在从 C++17 起已弃用。

If your string encoded with a more archaic encoding like Windows-1252, then you might need to use either an OS API like MultiByteToWideChar on Windows, or use a heavyweight Unicode library like LibICU to convert the data to a more standard encoding.如果您的字符串使用更古老的编码(如 Windows-1252)进行编码,那么您可能需要使用操作系统 API(如Windows上的MultiByteToWideChar ),或使用重量级的 Z7F6C02D96265DD1D547B1101DE11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM