简体   繁体   English

如何将UTF8字符数组转换为Windows 1252字符数组

[英]How to convert UTF8 char array to Windows 1252 char array

I am noob in C++ so I am very sorry for asking stupid question. 我是C ++的新手,所以我很抱歉提出愚蠢的问题。

I have a piece of text: Павло 我有一段文字:Павло

I get it somewhere from console output in piece of code I am working on. 我在我正在处理的一段代码中从控制台输出中得到它。 I know that this is cyrillic word hidded behind it. 我知道这是藏在后面的西里尔字母。 It's real value is "Петро". 它的真正价值是“Петро”。

With online encoding detector I have found that to read this text properly, I have to convert it from UTF-8 to Windows 1252. 使用在线编码检测器,我发现要正确阅读此文本,必须将其从UTF-8转换为Windows 1252。

How can I do it with code? 我该如何使用代码?

I have tried this, it gives some results, but it outputs 5 questionmarks (at least lenght expected) 我已经尝试过了,它给出了一些结果,但是它输出了5个问号(至少预期长度)

    wchar_t *CodePageToUnicode(int codePage, const char *src)
{
    if (!src) return 0;
    int srcLen = strlen(src);
    if (!srcLen)
    {
        wchar_t *w = new wchar_t[1];
        w[0] = 0;
        return w;
    }

    int requiredSize = MultiByteToWideChar(codePage,
        0,
        src, srcLen, 0, 0);

    if (!requiredSize)
    {
        return 0;
    }

    wchar_t *w = new wchar_t[requiredSize + 1];
    w[requiredSize] = 0;

    int retval = MultiByteToWideChar(codePage,
        0,
        src, srcLen, w, requiredSize);
    if (!retval)
    {
        delete[] w;
        return 0;
    }

    return w;
}

char *UnicodeToCodePage(int codePage, const wchar_t *src)
{
    if (!src) return 0;
    int srcLen = wcslen(src);
    if (!srcLen)
    {
        char *x = new char[1];
        x[0] = '\0';
        return x;
    }

    int requiredSize = WideCharToMultiByte(codePage,
        0,
        src, srcLen, 0, 0, 0, 0);

    if (!requiredSize)
    {
        return 0;
    }

    char *x = new char[requiredSize + 1];
    x[requiredSize] = 0;

    int retval = WideCharToMultiByte(codePage,
        0,
        src, srcLen, x, requiredSize, 0, 0);
    if (!retval)
    {
        delete[] x;
        return 0;
    }

    return x;
}
int main()
{
    const char *text = "Павло";

    // Now convert utf-8 back to ANSI:
    wchar_t *wText2 = CodePageToUnicode(65001, text);

    char *ansiText = UnicodeToCodePage(1252, wText2);
    cout << ansiText;
    _getch();

}

also tried this, but it's not working propery 也尝试过这个,但是不能正常工作

int main()
{
    const char *orig = "Павло";
    size_t origsize = strlen(orig) + 1;
    const size_t newsize = 100;
    size_t convertedChars = 0;
    wchar_t wcstring[newsize];
    mbstowcs_s(&convertedChars, wcstring, origsize, orig, _TRUNCATE);
    wcscat_s(wcstring, L" (wchar_t *)");

    std::wstring strUTF(wcstring);

    const wchar_t* szWCHAR = strUTF.c_str();

    cout << szWCHAR << '\n';


    char *buffer = new char[origsize / 2 + 1];

    WideCharToMultiByte(CP_ACP, 0, szWCHAR, -1, buffer, 256, NULL, NULL);

    cout << buffer;
    _getch();
}

There are a few options 有几种选择

  1. Using Windows API 使用Windows API

    Convert your UTF-8 to system UTF-16LE using MultiByteToWideChar and then from UTF-16LE to CP1251 (Cyrillic is 1251 not 1252) over WideCharToMultiByte 使用MultiByteToWideChar将您的UTF-8转换为系统UTF-16LE ,然后通过WideCharToMultiByte将其从UTF-16LECP1251 (西里尔字母为1251而不是1252)

  2. Using MS MLAGN API 使用MS MLAGN API

  3. Using GNU ICONV library 使用GNU ICONV库

  4. Using IBM ICU 使用IBM ICU

If you simply need to output your UNICODE into console, check this 如果您只需要将UNICODE输出到控制台中,请选中

This is a printing issue. 这是打印问题。 Your first function is correct, you can test it MessageBoxW : 您的第一个函数是正确的,可以对其进行测试MessageBoxW

wchar_t *wbuf = CodePageToUnicode(CP_UTF8, "Павло");
if(wbuf)
{
    MessageBoxW(0, wbuf, 0, 0);
    delete[]buf;
}

Output 输出量

"Павло" (not the same as what you said!) "Павло" (与您所说的不一样!)

You can print wide characters with std::wcout , or simplify the function to print using 1251 code page as follows: 您可以使用std::wcout打印宽字符,或简化使用1251代码页进行打印的功能,如下所示:

#include <iostream>
#include <string>
#include <Windows.h>

int main()
{
    char *buf = "Павло";
    int size;

    size = MultiByteToWideChar(CP_UTF8, 0, buf, -1, 0, 0);
    std::wstring wstr(size, 0);
    MultiByteToWideChar(CP_UTF8, 0, buf, -1, &wstr[0], size);

    int codepage = 1251;
    size = WideCharToMultiByte(codepage, 0, &wstr[0], -1, 0, 0, 0, 0);
    std::string str(size, 0);
    WideCharToMultiByte(codepage, 0, &wstr[0], -1, &str[0], size, 0, 0);

    SetConsoleOutputCP(codepage);
    std::cout << str << "\n";
    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM