简体   繁体   English

如何将UTF8字符数组转换为Windows 1252字符数组

[英]How to convert UTF8 char array to Windows 1252 char array

I am noob in C++ so I am very sorry for asking stupid question. 我是C ++的新手,所以我很抱歉提出愚蠢的问题。

I have a piece of text: Павло 我有一段文字:Павло

I get it somewhere from console output in piece of code I am working on. 我在我正在处理的一段代码中从控制台输出中得到它。 I know that this is cyrillic word hidded behind it. 我知道这是藏在后面的西里尔字母。 It's real value is "Петро". 它的真正价值是“Петро”。

With online encoding detector I have found that to read this text properly, I have to convert it from UTF-8 to Windows 1252. 使用在线编码检测器,我发现要正确阅读此文本,必须将其从UTF-8转换为Windows 1252。

How can I do it with code? 我该如何使用代码?

I have tried this, it gives some results, but it outputs 5 questionmarks (at least lenght expected) 我已经尝试过了,它给出了一些结果,但是它输出了5个问号(至少预期长度)

    wchar_t *CodePageToUnicode(int codePage, const char *src)
    if (!src) return 0;
    int srcLen = strlen(src);
    if (!srcLen)
        wchar_t *w = new wchar_t[1];
        w[0] = 0;
        return w;

    int requiredSize = MultiByteToWideChar(codePage,
        src, srcLen, 0, 0);

    if (!requiredSize)
        return 0;

    wchar_t *w = new wchar_t[requiredSize + 1];
    w[requiredSize] = 0;

    int retval = MultiByteToWideChar(codePage,
        src, srcLen, w, requiredSize);
    if (!retval)
        delete[] w;
        return 0;

    return w;

char *UnicodeToCodePage(int codePage, const wchar_t *src)
    if (!src) return 0;
    int srcLen = wcslen(src);
    if (!srcLen)
        char *x = new char[1];
        x[0] = '\0';
        return x;

    int requiredSize = WideCharToMultiByte(codePage,
        src, srcLen, 0, 0, 0, 0);

    if (!requiredSize)
        return 0;

    char *x = new char[requiredSize + 1];
    x[requiredSize] = 0;

    int retval = WideCharToMultiByte(codePage,
        src, srcLen, x, requiredSize, 0, 0);
    if (!retval)
        delete[] x;
        return 0;

    return x;
int main()
    const char *text = "Павло";

    // Now convert utf-8 back to ANSI:
    wchar_t *wText2 = CodePageToUnicode(65001, text);

    char *ansiText = UnicodeToCodePage(1252, wText2);
    cout << ansiText;


also tried this, but it's not working propery 也尝试过这个,但是不能正常工作

int main()
    const char *orig = "Павло";
    size_t origsize = strlen(orig) + 1;
    const size_t newsize = 100;
    size_t convertedChars = 0;
    wchar_t wcstring[newsize];
    mbstowcs_s(&convertedChars, wcstring, origsize, orig, _TRUNCATE);
    wcscat_s(wcstring, L" (wchar_t *)");

    std::wstring strUTF(wcstring);

    const wchar_t* szWCHAR = strUTF.c_str();

    cout << szWCHAR << '\n';

    char *buffer = new char[origsize / 2 + 1];

    WideCharToMultiByte(CP_ACP, 0, szWCHAR, -1, buffer, 256, NULL, NULL);

    cout << buffer;

There are a few options 有几种选择

  1. Using Windows API 使用Windows API

    Convert your UTF-8 to system UTF-16LE using MultiByteToWideChar and then from UTF-16LE to CP1251 (Cyrillic is 1251 not 1252) over WideCharToMultiByte 使用MultiByteToWideChar将您的UTF-8转换为系统UTF-16LE ,然后通过WideCharToMultiByte将其从UTF-16LECP1251 (西里尔字母为1251而不是1252)


  3. Using GNU ICONV library 使用GNU ICONV库

  4. Using IBM ICU 使用IBM ICU

If you simply need to output your UNICODE into console, check this 如果您只需要将UNICODE输出到控制台中,请选中

This is a printing issue. 这是打印问题。 Your first function is correct, you can test it MessageBoxW : 您的第一个函数是正确的,可以对其进行测试MessageBoxW

wchar_t *wbuf = CodePageToUnicode(CP_UTF8, "Павло");
    MessageBoxW(0, wbuf, 0, 0);

Output 输出量

"Павло" (not the same as what you said!) "Павло" (与您所说的不一样!)

You can print wide characters with std::wcout , or simplify the function to print using 1251 code page as follows: 您可以使用std::wcout打印宽字符,或简化使用1251代码页进行打印的功能,如下所示:

#include <iostream>
#include <string>
#include <Windows.h>

int main()
    char *buf = "Павло";
    int size;

    size = MultiByteToWideChar(CP_UTF8, 0, buf, -1, 0, 0);
    std::wstring wstr(size, 0);
    MultiByteToWideChar(CP_UTF8, 0, buf, -1, &wstr[0], size);

    int codepage = 1251;
    size = WideCharToMultiByte(codepage, 0, &wstr[0], -1, 0, 0, 0, 0);
    std::string str(size, 0);
    WideCharToMultiByte(codepage, 0, &wstr[0], -1, &str[0], size, 0, 0);

    std::cout << str << "\n";
    return 0;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM