一个文件lib将utf8（char *）转换为wchar_t？

Question

我正在使用很棒的libjson 。 我唯一的问题是我需要将utf8字符串（ char* ）转换为宽字符串（ wchar_t* ）。 我用谷歌搜索并尝试了3个不同的库，他们都失败了（由于缺少标题）。

我不需要任何花哨的东西。 只是单向转换。 我该怎么做呢？

Answer 1

如果你在windows上（考虑到你需要wchar_t，很可能是你），使用MultiByteToWideChar函数（在windows.h中声明），如下所示：

int length = MultiByteToWideChar(CP_UTF8, 0, src, src_length, 0, 0);
wchar_t *output_buffer = new wchar_t [length];
MultiByteToWideChar(CP_UTF8, 0, src, src_length, output_buffer, length);

或者，如果您要查找的只是UTF8的文字多字节表示（这是不可能的，但可能），请使用以下（stdlib.h）：

wchar_t * output_buffer = new wchar_t [1024];
int length = mbstowcs(output_buffer, src, 1024);
if(length > 1024){
    delete[] output_buffer;
    output_buffer = new wchar_t[length+1];
    mbstowcs(output_buffer, src, length);
}

希望这可以帮助。

Answer 2

下面成功地使CreateDirectoryW（）能够写入C：\\ Users \\ПетрКарасев，基本上是一个更容易理解的包装器，围绕前面提到的MultiByteTyoWideChar。

std::wstring utf16_from_utf8(const std::string & utf8)
{
    // Special case of empty input string
if (utf8.empty())
    return std::wstring();

// Шаг 1, Get length (in wchar_t's) of resulting UTF-16 string
const int utf16_length = ::MultiByteToWideChar(
    CP_UTF8,            // convert from UTF-8
    0,                  // default flags
    utf8.data(),        // source UTF-8 string
    utf8.length(),      // length (in chars) of source UTF-8 string
    NULL,               // unused - no conversion done in this step
    0                   // request size of destination buffer, in wchar_t's
    );
if (utf16_length == 0)
{
    // Error
    DWORD error = ::GetLastError();
    throw ;
}


// // Шаг 2, Allocate properly sized destination buffer for UTF-16 string
std::wstring utf16;
utf16.resize(utf16_length);

// // Шаг 3, Do the actual conversion from UTF-8 to UTF-16
if ( ! ::MultiByteToWideChar(
    CP_UTF8,            // convert from UTF-8
    0,                  // default flags
    utf8.data(),        // source UTF-8 string
    utf8.length(),      // length (in chars) of source UTF-8 string
    &utf16[0],          // destination buffer
    utf16.length()      // size of destination buffer, in wchar_t's
    ) )
{
    // не работает сука ... 
    DWORD error = ::GetLastError();
    throw;
}

return utf16; // ура!
}

Answer 3

这是我写的一段代码。 它似乎工作得很好。 它在utf8错误或当值> FFFF时返回0（不能由wchar_t持有）

#include <string>
using namespace std;
wchar_t* utf8_to_wchar(const char*utf8){
    wstring sz;
    wchar_t c;
    auto p=utf8;
    while(*p!=0){
        auto v=(*p);
        if(v>=0){
            c = v;
            sz+=c;
            ++p;
            continue;
        }
        int shiftCount=0;
        if((v&0xE0) == 0xC0){
            shiftCount=1;
            c = v&0x1F;
        }
        else if((v&0xF0) == 0xE0){
            shiftCount=2;
            c = v&0xF;
        }
        else
            return 0;
        ++p;
        while(shiftCount){
            v = *p;
            ++p;
            if((v&0xC0) != 0x80) return 0;
            c<<=6;
            c |= (v&0x3F);
            --shiftCount;
        }
        sz+=c;
    }
    return (wchar_t*)sz.c_str();
}

Answer 4

以下（未经测试的）代码显示如何将当前语言环境中的多字节字符串转换为宽字符串。 因此，如果您当前的区域设置是UTF-8，那么这将满足您的需求。

const char * inputStr = ... // your UTF-8 input
size_t maxSize = strlen(inputStr) + 1;
wchar_t * outputWStr = new wchar_t[maxSize];
size_t result = mbstowcs(outputWStr, inputStr, maxSize);
if (result == -1) {
    cerr << "Invalid multibyte characters in input";
}

您可以使用setlocale()来设置语言环境。

一个文件lib将utf8（char *）转换为wchar_t？

问题描述

4 个解决方案

解决方案1
8 2011-04-09 08:10:23

解决方案2
1 2013-05-07 04:58:58

解决方案3
0

解决方案4
0 2011-04-09 10:38:23

一个文件lib将utf8（char *）转换为wchar_t？

问题描述

4 个解决方案

解决方案1 8 2011-04-09 08:10:23

解决方案2 1 2013-05-07 04:58:58

解决方案3 0

解决方案4 0 2011-04-09 10:38:23

解决方案1
8 2011-04-09 08:10:23

解决方案2
1 2013-05-07 04:58:58

解决方案3
0

解决方案4
0 2011-04-09 10:38:23