简体   繁体   中英

"C.UTF-8" C++ locale on Windows?

I'm in the process of fixing a large open source cross-platform application such that it can handle file paths containing non-ANSI characters on Windows.


Update:

Based on answers and comments I got so far (thanks!) I feel like I should clarify some points:

  1. I cannot modify the code of dozens of third party libraries to use std::wchar_t . This is just not an option. The solution has to work with plain ol' std::fopen() , std::ifstream , etc.

  2. The solution I outline below works at 99%, at least on the system I'm developing on (Windows 10 version 1909, build 18363.535). I haven't tested on any other system yet.

    The only remaining issue, at least on my system , is basically number formatting and I'm hopeful that replacing the std::numpunct facet does the trick (but I haven't succeeded yet).


My current solution involves:

  1. Setting the C locale to .UTF-8 for the LC_CTYPE category on Windows (all other categories are set to the C locale as required by the application):

     // Required by the application. std::setlocale(LC_ALL, "C"); // On Windows, we want std::fopen() and other functions dealing with strings // and file paths to accept narrow-character strings encoded in UTF-8. #ifdef _WIN32 { #ifndef NDEBUG char* new_ctype_locale = #endif std::setlocale(LC_CTYPE, ".UTF-8"); assert(new_ctype_locale != nullptr); } #endif
  2. Configuring boost::filesystem::path to use the en_US.UTF-8 locale so that it too can deal with paths containing non-ANSI characters:

     boost::filesystem::path::imbue(std::locale("en_US.UTF-8"));

The last missing bit is to fix file I/O using C++ streams such as

std::ifstream istream(filename);

The simplest solution is probably to set the global C++ locale at the beginning of the application:

std::locale::global(std::locale("en_US.UTF-8"));

However that messes up formatting of numbers, eg 1234.56 gets formatted as 1,234.56.

Is there a locale that just specifies the encoding to be UTF-8 without messing with number formatting (or other things)?

Basically I'm looking for the C.UTF-8 locale, but that doesn't seem to exist on Windows.

Update: I suppose one solution would be to reset some (most? all?) of the facets of the locale, but I'm having a hard time finding information on how to do that.

Windows API does not respect the CRT locales, and the CRT implementation of fopen etc. directly call the narrow-char API, therefore changing the locale will not affect the encoding.

However, Windows 10 May 2019 Update (version 1903) introduced a support for UTF-8 in its narrow-char APIs . It can be enabled by embedding an appropriate manifest into your executable. Unfortunately it's a very recent addition, and so might not be an option if you need to target older systems.

Your other options include converting manually to wchar_t or using a layer that does that for you (like Boost.Filesystem, or even better, Boost.Nowide ).

Never mind locales.

On Windows you should use Microsoft's extension that adds a constructor taking const std::wchar_t* (expected to point to UTF-16) to std::ifstream .

Hopefully all your strings are UTF-8, or otherwise some consistent and sane encoding.

So just grab a UTF-8 → UTF-16 converter (they're lightweight) and pass filenames to std::ifstream as UTF-16 (in a std::wchar_t* ).

(Be sure to #ifdef it out so it doesn't get attempted on any other platform.)

You should also use _wfopen instead of std::fopen , in the same way, for the same reason.

That's it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM