简体   繁体   English

如何使Unicode iostream i / o在Windows和Unix上都可以工作?

[英]How can I make Unicode iostream i/o work in both Windows and Unix-land?

Note: This is a question-with-answer in order to document a technique that others might find useful, and in order to perhaps become aware of others' even better solutions. 注意:这是一个问题的答案 ,以便记录其他人可能觉得有用的技术,以便可能意识到其他人更好的解决方案。 Do feel free to add critique or questions as comments. 请随意添加批评或问题作为评论。 Also do feel free to add additional answers. 也可以随意添加其他答案。 :) :)


Problem #1: 问题#1:

  • Console support for Unicode via streams is severely limited at the Windows API level. 通过流的 Unicode控制台支持在Windows API级别受到严格限制。 The only relevant codepage available for ordinary desktop applications is 65001, UTF-8. 普通桌面应用程序唯一可用的相关代码页是65001,UTF-8。 And then interactive input fails at the API level, and even output of non-ASCII characters fails – and the C++ standard library implementations do not provide work-arounds for this problem. 然后交互式输入在API级别失败,甚至非ASCII字符的输出也失败 - 并且C ++标准库实现不提供此问题的解决方法。
#include <iostream>
#include <string>
using namespace std;

auto main() -> int
{
    wstring username;
    wcout << L"Hi, what’s your name? ";
    getline( wcin, username );
    wcout << "Pleased to meet you, " << username << "!\n";
}
H:\personal\web\blog alf on programming at wordpress\002\code>chcp 65001
Active code page: 65001

H:\personal\web\blog alf on programming at wordpress\002\code>g++ problem.input.cpp -std=c++14

H:\personal\web\blog alf on programming at wordpress\002\code>a
Hi, whatSøren Moskégård
                             ← No visible output.
H:\personal\web\blog alf on programming at wordpress\002\code>_

At the Windows API level a solution is to use non-stream-based direct console i/o when the relevant standard stream is bound to the console. 在Windows API级别,解决方案是在相关标准流绑定到控制台时使用非基于流的直接控制台i / o For example, using the WriteConsole API function. 例如,使用WriteConsole API函数。 And as an extension supported by both Visual C++ and MinGW g++ standard libraries, a mode can be set for the standard wide streams where WriteConsole is used, and there is also a mode for converting to/from UTF-8 as the external encoding. 作为Visual C ++和MinGW g ++标准库支持的扩展,可以为使用WriteConsole的标准宽流设置模式,并且还有一种模式用于转换为UTF-8或从UTF-8转换为外部编码。

And in Unix-land, a single call to setlocale( LC_ALL, "" ) , or its higher level C++ equivalent, suffices to make the wide streams work. 在Unix-land中,对setlocale( LC_ALL, "" )或其更高级别C ++等效的单个调用足以使宽流工作。

But how can such modes be set transparently & automatically, so that the same ordinary standard C++ code using the wide streams will work both in Windows and Unix-land? 但是如何透明地自动设置这些模式,以便使用宽流的相同普通标准C ++代码在Windows和Unix-land中都可以工作?

Noting, for the readers who shudder at the thought of using wide text in a Unix-land program, that this is in effect a pre-requisite for portable code that uses UTF-8 narrow text console i/o in Unix-land. 注意到,对于那些在Unix-land程序中使用宽文本的想法不寒而栗的读者来说,这实际上是在Unix-land中使用UTF-8窄文本控制台i / o的可移植代码的先决条件 Namely, code that automatically uses UTF-8 narrow text in Unix-land and wide text in Windows becomes possible and can be built on top of support for Unicode in Windows. 也就是说,在Windows中自动使用Unix-land和宽文本中的UTF-8窄文本的代码变得可能,并且可以建立在W​​indows中对Unicode的支持之上。 But without such support, no portability for the general case. 但是没有这样的支持,一般情况下都没有可移植性。


Problem #2: 问题#2:

  • With use of wide streams, default conversion of output items to wchar_t const* doesn't work. 使用宽流,输出项到wchar_t const*默认转换不起作用。
#include <iostream>
using namespace std;

struct Byte_string
{ operator char const* () const { return "Hurray, it works!"; } };

struct Wide_string
{ operator wchar_t const* () const { return L"Hurray, it works!"; } };

auto main() -> int
{
    wcout << "Byte string pointer: " << Byte_string() << endl;
    wcout << "Wide string pointer: " << Wide_string() << endl;
}
Byte string pointer: Hurray, it works!
Wide string pointer: 0x4ad018

This is a defect of the inconsistency type at the implementation level in the standard, that I reported long ago. 这是我很久以前报告的标准中实现级别的不一致类型的缺陷。 I'm not sure of the status, it may have been forgotten (I never got any mailings about it), or maybe a fix will be applied in C++17. 我不确定状态,它可能已被遗忘(我从来没有收到任何关于它的邮件),或者可能会在C ++ 17中应用修复程序。 Anyway, how can one work around that? 无论如何,如何解决这个问题?


In short, how can one make standard C++ code that uses Unicode wide text console i/o, work and be practical in both Windows and Unix-land? 简而言之,如何制作使用Unicode宽文本控制台i / o的标准C ++代码,在Windows和Unix-land中工作和实用?

Fix for the conversion problem: 修复了转换问题:

cppx/stdlib/iostreams_conversion_defect.fix.hpp CPPX / STDLIB / iostreams_conversion_defect.fix.hpp
 #pragma once //---------------------------------------------------------------------------------------- // PROBLEM DESCRIPTION. // // Output of wchar_t const* is only supported via an operator<< template. User-defined // conversions are not considered for template matching. This results in actual argument // with user conversion to wchar_t const*, for a wide stream, being presented as the // pointer value instead of the string. #include <iostream> #ifndef CPPX_NO_IOSTREAM_CONVERSION_FIX namespace std{ template< class Char_traits > inline auto operator<<( basic_ostream<wchar_t, Char_traits>& stream, wchar_t const ch ) -> basic_ostream<wchar_t, Char_traits>& { return operator<< <wchar_t, Char_traits>( stream, ch ); } template< class Char_traits > inline auto operator<<( basic_ostream<wchar_t, Char_traits>& stream, wchar_t const* const s ) -> basic_ostream<wchar_t, Char_traits>& { return operator<< <wchar_t, Char_traits>( stream, s ); } } // namespace std #endif 

Setting direct i/o mode in Windows: 在Windows中设置直接i / o模式:

This is a standard library extension that's supported by both Visual C++ and MinGW g++. 这是Visual C ++和MinGW g ++支持的标准库扩展。

First, just because it's used in the code, definition of the Ptr type builder (the main drawback of library-provided type builders is that ordinary type inference doesn't kick in, ie it's necessary in some cases to still use the raw operator notation): 首先,只是因为它的代码中使用的定义Ptr类型构建(库提供的类型建设者的主要缺点是普通型的推理不踢,即它在某些情况下必须仍然使用原运营商的符号):

cppx/core_language/type_builders.hpp CPPX / core_language / type_builders.hpp
 ⋮ template< class T > using Ptr = T*; ⋮ 

A helper definition, because it's used in more than one file: 帮助器定义,因为它在多个文件中使用:

cppx/stdlib/Iostream_mode.hpp CPPX / STDLIB / Iostream_mode.hpp
#pragma once
// UTF-8 mode for a stream in Windows.
#ifndef _WIN32
#   error This is a Windows only implementation.
#endif

#include <cppx/stdlib/Iostream_mode.hpp>

#include <stdio.h>      // FILE, stdin, stdout, stderr, etc.

// Non-standard headers, which are de facto standard in Windows:
#include <io.h>         // _setmode, _isatty, _fileno etc.
#include <fcntl.h>      // _O_WTEXT etc.

namespace cppx {

    inline
    auto set_utf8_mode( const Ptr< FILE > f )
        -> Iostream_mode
    {
        const int file_number = _fileno( f );       // See docs for error handling.
        if( file_number == -1 ) { return Iostream_mode::unknown; }
        const int new_mode = (_isatty( file_number )? _O_WTEXT : _O_U8TEXT);
        const int previous_mode = _setmode( file_number, new_mode );
        return (0?Iostream_mode()
            : previous_mode == -1?      Iostream_mode::unknown
            : new_mode == _O_WTEXT?     Iostream_mode::direct_io
            :                           Iostream_mode::utf_8
            );
    }

}  // namespace cppx

Mode setters (base functionality): 模式设定器(基本功能):

cppx/stdlib/impl/utf8_mode.for_windows.hpp CPPX / STDLIB / IMPL / utf8_mode.for_windows.hpp
#pragma once
// UTF-8 mode for a stream. For Unix-land this is a no-op & the locale must be UTF-8.

#include <cppx/core_language/type_builders.hpp>     // cppx::Ptr
#include <cppx/stdlib/Iostream_mode.hpp>

namespace cppx {
    inline
    auto set_utf8_mode( const Ptr< FILE > ) -> Iostream_mode;
}  // namespace cppx

#ifdef _WIN32   // This also covers 64-bit Windows.
#   include "impl/utf8_mode.for_windows.hpp"    // Using Windows-specific _setmode.
#else
#   include "impl/utf8_mode.generic.hpp"        // A do-nothing implementation.
#endif
cppx/stdlib/impl/utf8_mode.generic.hpp CPPX / STDLIB / IMPL / utf8_mode.generic.hpp
 #pragma once #include <stdio.h> // FILE, stdin, stdout, stderr, etc. #include <cppx/core_language/type_builders.hpp> // cppx::Ptr namespace cppx { inline auto set_utf8_mode( const Ptr< FILE > ) -> Iostream_mode { return Iostream_mode::unknown; } } // namespace cppx 
cppx/stdlib/utf8_mode.hpp CPPX / STDLIB / utf8_mode.hpp
 #pragma once // UTF-8 mode for a stream. For Unix-land this is a no-op & the locale must be UTF-8. #include <cppx/core_language/type_builders.hpp> // cppx::Ptr #include <cppx/stdlib/Iostream_mode.hpp> namespace cppx { inline auto set_utf8_mode( const Ptr< FILE > ) -> Iostream_mode; } // namespace cppx #ifdef _WIN32 // This also covers 64-bit Windows. # include "impl/utf8_mode.for_windows.hpp" // Using Windows-specific _setmode. #else # include "impl/utf8_mode.generic.hpp" // A do-nothing implementation. #endif 

Configuring the standard streams. 配置标准流。

In addition to setting direct console i/o mode or UTF-8 as appropriate in Windows, this fixes the implicit conversion defect; 除了在Windows中适当地设置直接控制台I / O模式或UTF-8之外,这还修复了隐式转换缺陷; (indirectly) calls setlocale so that wide streams work in Unix-land; (间接)调用setlocale以便宽流在Unix-land中工作; sets boolalpha just for good measure, as a more reasonable default; 设置boolalpha只是为了更好的衡量,作为一个更合理的默认; and includes all standard library headers to do with iostreams (I don't show the separate header file that does that, and it is to a degree a personal preference how much to include or whether to do such inclusion at all): 并包含与iostreams相关的所有标准库头文件(我没有显示那样做的单独头文件,并且在某种程度上个人偏好包含多少内容或是否完全包含此内容):

cppx/stdlib/iostreams.hpp CPPX / STDLIB / iostreams.hpp
 #pragma once // Standard iostreams but configured to work, plus, as utility, with boolalpha set. #include <raw_stdlib/iostreams.hpp> // <iostream>, <sstream>, <fstream> etc. for convenience. #include <cppx/core_language/type_builders.hpp> // cppx::Ptr #include <cppx/stdlib/utf8_mode.hpp> // stdin etc., stdlib::set_utf8_mode #include <locale> // std::locale #include <string> // std::string #include <cppx/stdlib/impl/iostreams_conversion_defect.fix.hpp> // Support arg conv. inline auto operator<< ( std::wostream& stream, const std::string& s ) -> std::wostream& { return (stream << s.c_str()); } // The following code's sole purpose is to automatically initialize the streams. namespace cppx { namespace utf8_iostreams { using std::locale; using std::ostream; using std::cin; using std::cout; using std::cerr; using std::clog; using std::wostream; using std::wcin; using std::wcout; using std::wcerr; using std::wclog; using std::boolalpha; namespace detail { using std::wstreambuf; // Based on "Filtering streambufs" code by James Kanze published at // <url: http://gabisoft.free.fr/articles/fltrsbf1.html>. class Correcting_input_buffer : public wstreambuf { private: wstreambuf* provider_; wchar_t buffer_; protected: auto underflow() -> int_type override { if( gptr() < egptr() ) { return *gptr(); } const int_type result = provider_->sbumpc(); if( result == L'\\n' ) { // Ad hoc workaround for g++ extra newline undesirable behavior: provider_->pubsync(); } if( traits_type::not_eof( result ) ) { buffer_ = result; setg( &buffer_, &buffer_, &buffer_ + 1 ); } return result ; } public: Correcting_input_buffer( wstreambuf* a_provider ) : provider_( a_provider ) {} }; } // namespace detail class Usage { private: static void init_once() { // In Windows there is no UTF-8 encoding spec for the locale, in Unix-land // it's the default. From Microsoft's documentation: "If you provide a code // page like UTF-7 or UTF-8, setlocale will fail, returning NULL". Still // this call is essential for making the wide streams work correctly in // Unix-land. locale::global( locale( "" ) ); // Effects a `setlocale( LC_ALL, "" )`. for( const Ptr<FILE> c_stream : {stdin, stdout, stderr} ) { const auto new_mode = set_utf8_mode( c_stream ); if( c_stream == stdin && new_mode == Iostream_mode::direct_io ) { static detail::Correcting_input_buffer correcting_buffer( wcin.rdbuf() ); wcin.rdbuf( &correcting_buffer ); } } for( const Ptr<ostream> stream_ptr : {&cout, &cerr, &clog} ) { *stream_ptr << boolalpha; } for( const Ptr<wostream> stream_ptr : {&wcout, &wcerr, &wclog} ) { *stream_ptr << boolalpha; } } public: Usage() { static const bool dummy = (init_once(), true); (void) dummy; } }; namespace detail { const Usage usage; } // namespace detail }} // namespace cppx::utf8_iostreams 

The two example programs in the question are fixed simply by including the above header instead of or in addition to <iostream> . 问题中的两个示例程序仅通过包含上述标题而不是<iostream>或除了<iostream>之外的其他标题来修复。 When it's in addition to it can be in a separate translation unit (except for the implicit conversion defect fix, if that's desired the header for it must be included somehow). 除此之外,它可以在一个单独的翻译单元中(隐式转换缺陷修复除外,如果需要,必须以某种方式包含它的标题)。 Or eg as a forced include in the build command. 或者例如作为构建命令中的强制包含。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM