將字節數組從utf-16轉換為utf-8

Question

我有一個字節數組

uint8_t array[] = {0x00, 0x72, 0x00, 0x6f,  0x00, 0x6f, 0x00, 0x74};

我知道，在文字上這是“根”。 我有一個應將utf-16轉換為utf-8的函數。 這是代碼：

inline bool convertUcs2ToUtf8(const std::vector<char> &from, std::string* const to) {
    return ucnvConvert("UTF-16", "UTF-8", from, to);
}

static inline bool ucnvConvert(const char *enc_from,
                               const char *enc_to,
                               const std::vector<char> &from,
                               std::string* const to)
{
    if (from.empty()) {
        to->clear();
        return true;
    }

    unsigned int maxOutSize = from.size() * 3 + 1;
    std::vector<char> outBuf(maxOutSize);

    iconv_t c = iconv_open(enc_to, enc_from);
    ASSERT_MSG(c != NULL, "convert: illegal encodings");
    char *from_ptr = const_cast<char*>(from.data());
    char *to_ptr = &outBuf[0];

    size_t inleft = from.size(), outleft = maxOutSize;
    size_t n = iconv(c, &from_ptr, &inleft, &to_ptr, &outleft);
    bool success = true;
    if (n == (size_t)-1) {
        success = false;
        if (errno == E2BIG) {
            ELOG("convert: insufficient space from");
        } else if (errno == EILSEQ) {
            ELOG("convert: invalid input sequence");
        } else if (errno == EINVAL) {
            ELOG("convert: incomplete input sequence");
        }
    }
    if (success) {
        to->assign(&outBuf[0], maxOutSize - outleft);
    }
    iconv_close(c);
    return success;
}

它適用於西里爾字母（從0x04開始），但是當我嘗試將數組放入其中時，我得到了類似以下內容：

爀漀漀琀開㌀㜀

等等...這是怎么了？

Answer 1

必須為UTF-16輸入指定字節順序。 由於要傳遞utf16-be （大端）編碼緩沖區，因此應在其utf16-be加上適當的字節順序標記：

uint8_t array[] = { 0xfe, 0xff, 0x00, 0x72, 0x00, 0x6f, 0x00, 0x6f, 0x00, 0x74 };

但這會產生您可能不希望使用的帶有字節順序標記的UTF-8輸出。 然后，最有效的方法是通過這種方式指定字節序：

ucnvConvert("UTF-16BE", "UTF-8", from, to);

將字節數組從utf-16轉換為utf-8

問題描述

1 個解決方案

解決方案1
3 已采納 2015-05-14 19:15:21

將字節數組從utf-16轉換為utf-8

問題描述

1 個解決方案

解決方案1 3 已采納 2015-05-14 19:15:21

解決方案1
3 已采納 2015-05-14 19:15:21