简体   繁体   中英

Conversion from UTF-8 encoded string to bytes and vice versa in C++

In C#, we have following functions to convert UTF-8 encoded string to sequence of bytes and vice-versa.

  1. Encoding.UTF8.GetString(Byte[])
  2. Encoding.UTF8.GetBytes(Char[]) / Encoding.UTF8.GetBytes(String)

I am trying to achieve the same thing in C++ as follows:

std::string GetStringFromBytes(std::vector<uint8_t> bytes){
    std::string str(bytes.begin(), bytes.end());
    return str;
}

std::vector<uint8_t> GetBytesFromString(const std::string& str){
    std::vector<uint8_t> bytes(str.begin(), str.end());
    return bytes;
}

Is this approach correct? I'm assuming that the string that I'm converting is already in UTF-8 format.

Hello there if you try to run your code it will not function and will give you an error and say that std::vector is not defined here is some documentation one what std::vector is: https://www.cplusplus.com/reference/vector/vector/ so far as your error you might want to include the vector library in your code if you don't have it already. another version of this question that may help you might be here: Get bytes from std::string in C++

C# string uses UTF-16, and thus requires a charset conversion to/from UTF-8.

C++ std::string does not use UTF-16 ( std::u16string does). So, if you have a UTF-8 encoded std::string , you already have the raw bytes for it, just copy them as-is. The code you have shown is doing exactly that, and is fine for UTF-8 strings. Otherwise, if you have/need std::string encoded in some other charset, you will need a charset conversion to/from UTF-8. There are 3rd party Unicode libraries that can handle that, such as libiconv, ICU, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM