简体   繁体   English

UTF-8 编码字符串到字节的转换,反之亦然 C++

[英]Conversion from UTF-8 encoded string to bytes and vice versa in C++

In C#, we have following functions to convert UTF-8 encoded string to sequence of bytes and vice-versa.在 C# 中,我们有以下函数将 UTF-8 编码字符串转换为字节序列,反之亦然。

  1. Encoding.UTF8.GetString(Byte[])编码.UTF8.GetString(Byte[])
  2. Encoding.UTF8.GetBytes(Char[]) / Encoding.UTF8.GetBytes(String) Encoding.UTF8.GetBytes(Char[]) / Encoding.UTF8.GetBytes(String)

I am trying to achieve the same thing in C++ as follows:我试图在 C++ 中实现相同的目标,如下所示:

std::string GetStringFromBytes(std::vector<uint8_t> bytes){
    std::string str(bytes.begin(), bytes.end());
    return str;
}

std::vector<uint8_t> GetBytesFromString(const std::string& str){
    std::vector<uint8_t> bytes(str.begin(), str.end());
    return bytes;
}

Is this approach correct?这种方法正确吗? I'm assuming that the string that I'm converting is already in UTF-8 format.我假设我正在转换的字符串已经是 UTF-8 格式。

Hello there if you try to run your code it will not function and will give you an error and say that std::vector is not defined here is some documentation one what std::vector is: https://www.cplusplus.com/reference/vector/vector/ so far as your error you might want to include the vector library in your code if you don't have it already.您好,如果您尝试运行您的代码,它不会 function 并且会给您一个错误并说 std::vector 未定义这里是一些文档,其中 std::vector 是什么: https://www.cplusplus.com /reference/vector/vector/就您的错误而言,如果您还没有矢量库,您可能希望在代码中包含它。 another version of this question that may help you might be here: Get bytes from std::string in C++这个问题的另一个版本可能对您有所帮助: Get bytes from std::string in C++

C# string uses UTF-16, and thus requires a charset conversion to/from UTF-8. C# string使用 UTF-16,因此需要与 UTF-8 进行字符集转换

C++ std::string does not use UTF-16 ( std::u16string does). C++ std::string不使用 UTF-16( std::u16string使用)。 So, if you have a UTF-8 encoded std::string , you already have the raw bytes for it, just copy them as-is.所以,如果你有一个 UTF-8 编码的std::string ,你已经有了它的原始字节,只需按原样复制它们。 The code you have shown is doing exactly that, and is fine for UTF-8 strings.您显示的代码正是这样做的,并且适用于 UTF-8 字符串。 Otherwise, if you have/need std::string encoded in some other charset, you will need a charset conversion to/from UTF-8.否则,如果您有/需要以其他字符集编码的std::string ,则需要与 UTF-8 进行字符集转换 There are 3rd party Unicode libraries that can handle that, such as libiconv, ICU, etc.有可以处理的第 3 方 Unicode 库,例如 libiconv、ICU 等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM