简体   繁体   English

获取boost :: locale :: conv中函数的用户代码页名称

[英]Get the user's codepage name for functions in boost::locale::conv

The task at hand 手头的任务

I'm parsing a filename from an UTF-8 encoded XML on Windows. 我正在从Windows上的UTF-8编码的XML解析文件名。 I need to pass that filename on to a function that I can't change. 我需要将该文件名传递给我无法更改的函数。 Internally it uses _fsopen() which does not support Unicode strings. 在内部,它使用不支持Unicode字符串的_fsopen()

Current approach 目前的方法

My current approach is to convert the filename to the user's charset hoping that the filename is representable in that encoding. 我当前的方法是将文件名转换为用户的字符集,希望文件名在该编码中可表示。 I'm then using boost::locale::conv::from_utf() to convert from UTF-8 and I'm using boost::locale::util::get_system_locale() to get the name of the current locale. 然后,我使用boost::locale::conv::from_utf()从UTF-8转换,然后使用boost::locale::util::get_system_locale()来获取当前语言环境的名称。

Life is good? 生活很好?

I'm on a German system using code page Windows-1252 thus get_system_locale() correctly yields de_DE.windows-1252 . 我在使用代码页Windows-1252的德语系统上,因此get_system_locale()正确生成de_DE.windows-1252 If I test the approach with a filename containing an umlaut everything works as expected. 如果我使用包含变音符号的文件名测试该方法,则一切正常。

The Problem 问题

Just to make sure I switched my system locale to Ukrainian which uses code page Windows-1251 . 只是为了确保我将系统语言环境切换到使用代码页Windows-1251的乌克兰语。 Using some Cyrillic letter in the filename my approach fails. 在文件名中使用一些西里尔字母,我的方法失败了。 The reason is that get_system_locale() still yields de_DE.windows-1252 which is now incorrect. 原因是get_system_locale()仍然产生de_DE.windows-1252 ,这现在是不正确的。

On the other side GetACP() correctly yields 1252 for the German locale and 1251 for the Ukrainian locale. 另一方面,对于德语语言环境, GetACP()正确产生1252,对于乌克兰语言环境,正确产生1251。 I also know that Boost.Locale can convert to a given locale as this small test program works as I expect: 我也知道Boost.Locale可以转换为给定的语言环境,因为这个小型测试程序可以按我的预期工作:

#include <boost/locale.hpp>
#include <iostream>
#include <string>
#include <windows.h>

int main()
{
    std::cout << "Codepage: " << GetACP() << std::endl;
    std::cout << "Boost.Locale: " << boost::locale::util::get_system_locale() << std::endl;

    namespace blc = boost::locale::conv;
    // Cyrillic small letter zhe -> \xe6 (ш on 1251, æ on 1252)
    std::string const test1251 = blc::from_utf(std::string("\xd0\xb6"), "windows-1251");
    std::cout << "1251: " << static_cast<int>(test1251.front()) << std::endl;
    // Latin small letter sharp s -> \xdf (Я on 1251, ß on 1252)
    auto const test1252 = blc::from_utf(std::string("\xc3\x9f"), "windows-1252");
    std::cout << "1252: " << static_cast<int>(test1252.front()) << std::endl;

}

Questions 问题

  • How can I query the name of the user locale in a format Boost.Locale supports? 如何查询Boost.Locale支持的格式的用户语言环境名称? Using std::locale("").name() yields German_Germany.1252 , using it results in a boost::locale::conv::invalid_charset_error exception. 使用std::locale("").name()产生German_Germany.1252 ,使用它会导致boost::locale::conv::invalid_charset_error异常。

  • Is it possible that the system locale remains de_DE.windows-1252 although I'm supposedly changing it as local admin? 尽管我应该将其更改为本地管理员,但系统区域设置是否仍可能保持de_DE.windows-1252 Similarly system language is German although my account's language is English. 同样,系统语言为德语,尽管我帐户的语言为英语。 (Log in screen is German until I log in) (在我登录之前,登录屏幕为德语)

  • should I stick with using short filenames ? 我应该坚持使用短文件名吗? Does not seem to work reliably though. 虽然似乎无法可靠地工作。

Fine-print 印刷精美

  • Compiler is MSVC18 编译器为MSVC18
  • Boost is version 1.56.0, backend supposedly winapi Boost是版本1.56.0,后端据说是winapi
  • System is Win7, system language is German, user language English 系统是Win7,系统语言是德语,用户语言是英语

ANSI is deprecated so don't bother with it. 不建议使用ANSI,因此请勿打扰。

Windows uses UTF16, you must convert from UTF8 to UTF16 using MultiByteToWideChar . Windows使用UTF16,必须使用MultiByteToWideChar从UTF8转换为UTF16。 This conversion is safe. 此转换是安全的。

std::wstring getU16(const std::string &str)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

You then use _wfsopen (from the link you provided) to open file with UTF16 name. 然后,使用_wfsopen (通过提供的链接)打开具有UTF16名称的文件。

int main()
{
    //UTF8 source:
    std::string filename_u8;

    //This line works in VS2015 only
    //For older version comment out the next line, obtain UTF8 from another source
    filename_u8 = u8"c:\\test\\__ελληνικά.txt";

    //convert to UTF16
    std::wstring filename_utf16 = getU16(filename_u8);

    FILE *file = NULL;
    _wfopen_s(&file, filename_utf16.c_str(), L"w");
    if (file)
    {
        //Add BOM, optional...

        //Write the file name in to file, for testing...
        fwrite(filename_u8.data(), 1, filename_u8.length(), file);

        fclose(file);
    }
    else
    {
        cout << "access denined, or folder doesn't exits...
    }

    return 0;
}


Edit, getting ANSI from UTF8, using GetACP() 编辑,使用GetACP()从UTF8获取ANSI

 std::wstring string_to_wstring(const std::string &str, int codepage) { if (str.empty()) return std::wstring(); int sz = MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), 0, 0); std::wstring res(sz, 0); MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), &res[0], sz); return res; } std::string wstring_to_string(const std::wstring &wstr, int codepage) { if (wstr.empty()) return std::string(); int sz = WideCharToMultiByte(codepage, 0, &wstr[0], (int)wstr.size(), 0, 0, 0, 0); std::string res(sz, 0); WideCharToMultiByte(codepage, 0, &wstr[0], (int)wstr.size(), &res[0], sz, 0, 0); return res; } std::string get_ansi_from_utf8(const std::string &utf8, int codepage) { std::wstring utf16 = string_to_wstring(utf8, CP_UTF8); std::string ansi = wstring_to_string(utf16, codepage); return ansi; } 

Barmak's way is the best way to do it. 巴马克的方式是最好的方式。

To clear up the locale stuff, the process always starts with the "C" locale. 要清除语言环境的内容,该过程始终以“ C”语言环境开始。 You can use the setlocale function to set the locale to the system default or any arbitrary locale. 您可以使用setlocale函数将语言环境设置为系统默认语言或任何任意语言环境。

#include <clocale>

// Get the current locale
setlocale(LC_ALL,NULL);

// Set locale to system default
setlocale(LC_ALL,"");

// Set locale to German
setlocale(LC_ALL,"de-DE");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM