简体   繁体   English

UTF-8 output 在 Windows 控制台上

[英]UTF-8 output on Windows console

The following code shows unexpected behaviour on my machine (tested with Visual C++ 2008 SP1 on Windows XP and VS 2012 on Windows 7):以下代码显示了我机器上的意外行为(在 Windows XP 上使用 Visual C++ 2008 SP1 和 Windows 7 上的 VS 2012 测试):

#include <iostream>
#include "Windows.h"

int main() {
    SetConsoleOutputCP( CP_UTF8 );
    std::cout << "\xc3\xbc";
    int fail = std::cout.fail() ? '1': '0';
    fputc( fail, stdout );
    fputs( "\xc3\xbc", stdout );
}

I simply compiled with cl /EHsc test.cpp .我只是用cl /EHsc test.cpp

Windows XP: Output in a console window is ü0ü (translated to Codepage 1252, originally shows some line drawing charachters in the default Codepage, perhaps 437). Windows XP: Output 在控制台中 window 是ü0ü代码页(翻译为代码页 1252,最初显示一些线条图字符)在默认值中。 When I change the settings of the console window to use the "Lucida Console" character set and run my test.exe again, output is changed to , which means当我将控制台 window 的设置更改为使用“Lucida Console”字符集并再次运行我的 test.exe 时, output 更改为 ,这意味着

  • the character ü can be written using fputs and its UTF-8 encoding C3 BC字符ü可以使用fputs及其 UTF-8 编码C3 BC编写
  • std::cout does not work for whatever reason std::cout无论出于何种原因都不起作用
  • the streams failbit is setting after trying to write the character尝试写入字符后正在设置流failbit

Windows 7: Output using Consolas is ��0ü . Windows 7: Output 使用 Consolas 是 ��0ü Even more interesting.更有趣的是。 The correct bytes are written, probably (at least when redirecting the output to a file) and the stream state is ok, but the two bytes are written as separate characters).可能写入了正确的字节(至少在将 output 重定向到文件时)和 stream state 是可以的,但是这两个字节被写为单独的字符。

I tried to raise this issue on "Microsoft Connect" (see here ), but MS has not been very helpful.我试图在“Microsoft Connect”上提出这个问题(见这里),但 MS 并没有多大帮助。 You might as well look here as something similar has been asked before.你不妨看看这里,因为以前有人问过类似的问题。

Can you reproduce this problem?你能重现这个问题吗?

What am I doing wrong?我究竟做错了什么? Shouldn't the std::cout and the fputs have the same effect? std::coutfputs不应该具有相同的效果吗?

SOLVED: (sort of) Following mike.dld's idea I implemented a std::stringbuf doing the conversion from UTF-8 to Windows-1252 in sync() and replaced the streambuf of std::cout with this converter (see my comment on mike.dld's answer).解决:(有点)按照 mike.dld 的想法,我实现了一个std::stringbuf stringbuf,在sync()中执行从 UTF-8 到 Windows-1252 的转换,并用这个转换器替换了std::cout的流缓冲区(请参阅我对 mike. dld的回答)。

I understand the question is quite old, but if someone would still be interested, below is my solution. 我知道这个问题很老,但如果有人仍然感兴趣,下面是我的解决方案。 I've implemented a quite simple std::streambuf descendant and then passed it to each of standard streams on the very beginning of program execution. 我已经实现了一个非常简单的std :: streambuf后代,然后在程序执行的最初阶段将它传递给每个标准流。

This allows you to use UTF-8 everywhere in your program. 这允许您在程序中的任何位置使用UTF-8。 On input, data is taken from console in Unicode and then converted and returned to you in UTF-8. 输入时,数据以Unicode格式从控制台获取,然后转换并以UTF-8返回给您。 On output the opposite is done, taking data from you in UTF-8, converting it to Unicode and sending to console. 在输出上完成相反的操作,从UTF-8中获取数据,将其转换为Unicode并发送到控制台。 No issues found so far. 到目前为止没有发现问题。

Also note, that this solution doesn't require any codepage modification, with either SetConsoleCP , SetConsoleOutputCP or chcp , or something else. 另请注意,此解决方案不需要对SetConsoleCPSetConsoleOutputCPchcp或其他任何内容进行任何代码页修改。

That's the stream buffer: 这是流缓冲区:

class ConsoleStreamBufWin32 : public std::streambuf
{
public:
    ConsoleStreamBufWin32(DWORD handleId, bool isInput);

protected:
    // std::basic_streambuf
    virtual std::streambuf* setbuf(char_type* s, std::streamsize n);
    virtual int sync();
    virtual int_type underflow();
    virtual int_type overflow(int_type c = traits_type::eof());

private:
    HANDLE const m_handle;
    bool const m_isInput;
    std::string m_buffer;
};

ConsoleStreamBufWin32::ConsoleStreamBufWin32(DWORD handleId, bool isInput) :
    m_handle(::GetStdHandle(handleId)),
    m_isInput(isInput),
    m_buffer()
{
    if (m_isInput)
    {
        setg(0, 0, 0);
    }
}

std::streambuf* ConsoleStreamBufWin32::setbuf(char_type* /*s*/, std::streamsize /*n*/)
{
    return 0;
}

int ConsoleStreamBufWin32::sync()
{
    if (m_isInput)
    {
        ::FlushConsoleInputBuffer(m_handle);
        setg(0, 0, 0);
    }
    else
    {
        if (m_buffer.empty())
        {
            return 0;
        }

        std::wstring const wideBuffer = utf8_to_wstring(m_buffer);
        DWORD writtenSize;
        ::WriteConsoleW(m_handle, wideBuffer.c_str(), wideBuffer.size(), &writtenSize, NULL);
    }

    m_buffer.clear();

    return 0;
}

ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::underflow()
{
    if (!m_isInput)
    {
        return traits_type::eof();
    }

    if (gptr() >= egptr())
    {
        wchar_t wideBuffer[128];
        DWORD readSize;
        if (!::ReadConsoleW(m_handle, wideBuffer, ARRAYSIZE(wideBuffer) - 1, &readSize, NULL))
        {
            return traits_type::eof();
        }

        wideBuffer[readSize] = L'\0';
        m_buffer = wstring_to_utf8(wideBuffer);

        setg(&m_buffer[0], &m_buffer[0], &m_buffer[0] + m_buffer.size());

        if (gptr() >= egptr())
        {
            return traits_type::eof();
        }
    }

    return sgetc();
}

ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::overflow(int_type c)
{
    if (m_isInput)
    {
        return traits_type::eof();
    }

    m_buffer += traits_type::to_char_type(c);
    return traits_type::not_eof(c);
}

The usage then is as follows: 用法如下:

template<typename StreamT>
inline void FixStdStream(DWORD handleId, bool isInput, StreamT& stream)
{
    if (::GetFileType(::GetStdHandle(handleId)) == FILE_TYPE_CHAR)
    {
        stream.rdbuf(new ConsoleStreamBufWin32(handleId, isInput));
    }
}

// ...

int main()
{
    FixStdStream(STD_INPUT_HANDLE, true, std::cin);
    FixStdStream(STD_OUTPUT_HANDLE, false, std::cout);
    FixStdStream(STD_ERROR_HANDLE, false, std::cerr);

    // ...

    std::cout << "\xc3\xbc" << std::endl;

    // ...
}

Left out wstring_to_utf8 and utf8_to_wstring could easily be implemented with WideCharToMultiByte and MultiByteToWideChar WinAPI functions. 省略了wstring_to_utf8utf8_to_wstring可以使用WideCharToMultiByteMultiByteToWideChar WinAPI函数轻松实现。

Oi. 爱。 Congratulations on finding a way to change the code page of the console from inside your program. 恭喜您找到了一种从程序内部更改控制台代码页的方法。 I didn't know about that call, I always had to use chcp. 我不知道那个电话,我总是不得不使用chcp。

I'm guessing the C++ default locale is getting involved. 我猜测C ++默认语言环境正在参与其中。 By default it will use the code page provide by GetThreadLocale() to determine the text encoding of non-wstring stuff. 默认情况下,它将使用GetThreadLocale()提供的代码页来确定非wstring内容的文本编码。 This generally defaults to CP1252. 这通常默认为CP1252。 You could try using SetThreadLocale() to get to UTF-8 (if it even does that, can't recall), with the hope that std::locale defaults to something that can handle your UTF-8 encoding. 您可以尝试使用SetThreadLocale()来获取UTF-8(如果它甚至无法回忆),希望std :: locale默认为可以处理您的UTF-8编码的东西。

It's time to close this now. 是时候关闭它了。 Stephan T. Lavavej says the behaviour is "by design", although I cannot follow this explanation. Stephan T. Lavavej 这种行为是“按设计”,虽然我不能按照这种解释。

My current knowledge is: Windows XP console in UTF-8 codepage does not work with C++ iostreams. 我目前的知识是:UTF-8代码页中的Windows XP控制台不能与C ++ iostream一起使用。

Windows XP is getting out of fashion now and so does VS 2008. I'd be interested to hear if the problem still exists on newer Windows systems. Windows XP现在已经过时了,VS 2008也是如此。我很想知道新系统上是否存在问题。

On Windows 7 the effect is probably due to the way the C++ streams output characters. 在Windows 7上 ,效果可能是由于C ++流输出字符的方式。 As seen in an answer to Properly print utf8 characters in windows console , UTF-8 output fails with C stdio when printing one byte after after another like putc('\\xc3'); putc('\\xbc'); 正如在Windows控制台中正确打印utf8字符的答案中所见,UTF-8输出在使用C stdio时失败,因为在putc('\\xc3'); putc('\\xbc');之后打印一个字节后putc('\\xc3'); putc('\\xbc'); putc('\\xc3'); putc('\\xbc'); as well. 同样。 Perhaps this is what C++ streams do here. 也许这就是C ++流在这里做的事情。

I just follow mike.dld 's answer in this question, and add the printf support for the UTF-8 string.我只是按照mike.dld在这个问题中的回答,并为UTF-8字符串添加printf支持。

As mkluwe mentioned in his answer that by default, printf function will output to the console one by one byte, while the console can't handle single byte correctly.正如mkluwe在他的回答中提到的那样,默认情况下, printf function 将 output 一个字节一个字节地发送到控制台,而控制台无法正确处理单个字节。 My method is quite simple, I use the snprintf function to print the whole content to a internal string buffer, and then dump the buffer to std::cout .我的方法很简单,我使用snprintf function 将整个内容打印到内部字符串缓冲区,然后将缓冲区转储到std::cout

Here is the full testing code:这是完整的测试代码:

#include <iostream>
#include <locale>
#include <windows.h>
#include <cstdlib>

using namespace std;

// https://stackoverflow.com/questions/4358870/convert-wstring-to-string-encoded-in-utf-8
#include <codecvt>
#include <string>

// convert UTF-8 string to wstring
std::wstring utf8_to_wstring (const std::string& str)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.from_bytes(str);
}

// convert wstring to UTF-8 string
std::string wstring_to_utf8 (const std::wstring& str)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
    return myconv.to_bytes(str);
}

// https://stackoverflow.com/questions/1660492/utf-8-output-on-windows-console
// mike.dld's answer
class ConsoleStreamBufWin32 : public std::streambuf
{
public:
    ConsoleStreamBufWin32(DWORD handleId, bool isInput);

protected:
    // std::basic_streambuf
    virtual std::streambuf* setbuf(char_type* s, std::streamsize n);
    virtual int sync();
    virtual int_type underflow();
    virtual int_type overflow(int_type c = traits_type::eof());

private:
    HANDLE const m_handle;
    bool const m_isInput;
    std::string m_buffer;
};

ConsoleStreamBufWin32::ConsoleStreamBufWin32(DWORD handleId, bool isInput) :
    m_handle(::GetStdHandle(handleId)),
    m_isInput(isInput),
    m_buffer()
{
    if (m_isInput)
    {
        setg(0, 0, 0);
    }
}

std::streambuf* ConsoleStreamBufWin32::setbuf(char_type* /*s*/, std::streamsize /*n*/)
{
    return 0;
}

int ConsoleStreamBufWin32::sync()
{
    if (m_isInput)
    {
        ::FlushConsoleInputBuffer(m_handle);
        setg(0, 0, 0);
    }
    else
    {
        if (m_buffer.empty())
        {
            return 0;
        }

        std::wstring const wideBuffer = utf8_to_wstring(m_buffer);
        DWORD writtenSize;
        ::WriteConsoleW(m_handle, wideBuffer.c_str(), wideBuffer.size(), &writtenSize, NULL);
    }

    m_buffer.clear();

    return 0;
}

ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::underflow()
{
    if (!m_isInput)
    {
        return traits_type::eof();
    }

    if (gptr() >= egptr())
    {
        wchar_t wideBuffer[128];
        DWORD readSize;
        if (!::ReadConsoleW(m_handle, wideBuffer, ARRAYSIZE(wideBuffer) - 1, &readSize, NULL))
        {
            return traits_type::eof();
        }

        wideBuffer[readSize] = L'\0';
        m_buffer = wstring_to_utf8(wideBuffer);

        setg(&m_buffer[0], &m_buffer[0], &m_buffer[0] + m_buffer.size());

        if (gptr() >= egptr())
        {
            return traits_type::eof();
        }
    }

    return sgetc();
}

ConsoleStreamBufWin32::int_type ConsoleStreamBufWin32::overflow(int_type c)
{
    if (m_isInput)
    {
        return traits_type::eof();
    }

    m_buffer += traits_type::to_char_type(c);
    return traits_type::not_eof(c);
}

template<typename StreamT>
inline void FixStdStream(DWORD handleId, bool isInput, StreamT& stream)
{
    if (::GetFileType(::GetStdHandle(handleId)) == FILE_TYPE_CHAR)
    {
        stream.rdbuf(new ConsoleStreamBufWin32(handleId, isInput));
    }
}

// some code are from this blog
// https://blog.csdn.net/witton/article/details/108087135

#define printf(fmt, ...) __fprint(stdout, fmt, ##__VA_ARGS__ )

int __vfprint(FILE *fp, const char *fmt, va_list va)
{
    // https://stackoverflow.com/questions/7315936/which-of-sprintf-snprintf-is-more-secure
    size_t nbytes = snprintf(NULL, 0, fmt, va) + 1; /* +1 for the '\0' */
    char *str = (char*)malloc(nbytes);
    snprintf(str, nbytes, fmt, va);
    std::cout << str;
    free(str);
    return nbytes;
}

int __fprint(FILE *fp, const char *fmt, ...)
{
    va_list va;
    va_start(va, fmt);
    int n = __vfprint(fp, fmt, va);
    va_end(va);
    return n;
}

int main()
{
    FixStdStream(STD_INPUT_HANDLE, true, std::cin);
    FixStdStream(STD_OUTPUT_HANDLE, false, std::cout);
    FixStdStream(STD_ERROR_HANDLE, false, std::cerr);

    // ...

    std::cout << "\xc3\xbc" << std::endl;

    printf("\xc3\xbc");

    // ...
    return 0;
}

The source code is saved in UTF-8 format, and build under Msys2's GCC and run under Windows 7 64bit.源码保存为UTF-8格式,在Msys2的GCC下构建,在Windows 7 64bit下运行。 Here is the result这是结果

ü
ü

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM