简体   繁体   English

将日语 wstring 转换为 std::string

[英]Convert Japanese wstring to std::string

Can anyone suggest a good method to convert a Japanese std::wstring to std::string ?谁能建议一种将日语std::wstring转换为std::string的好方法?

I used the below code.我使用了下面的代码。 Japanese strings are not converting properly on an English OS.日文字符串无法在英文操作系统上正确转换。

std::string WstringTostring(std::wstring str)
{
    size_t size = 0;
    _locale_t lc = _create_locale(LC_ALL, "ja.JP.utf8");
    errno_t err = _wcstombs_s_l(&size, NULL, 0, &str[0], _TRUNCATE, lc);
    std::string ret = std::string(size, 0);
    err = _wcstombs_s_l(&size, &ret[0], size, &str[0], _TRUNCATE, lc);
    _free_locale(lc);
    ret.resize(size-1);
    return ret;
}

The wstring is "C\\files\\ブ種別.pdf" . wstring"C\\files\\ブ種別.pdf"

The converted string is "C:\\files\\ブ種別.pdf" .转换后的string"C:\\files\\ブ種別.pdf"

It actually looks right to me.它实际上在我看来是正确的。

That is the UTF-8-encoded version of your input (which presumably was UTF-16 before conversion), but shown in its ASCII-decoded form due to a mistake somewhere in your toolchain.那是您的输入的 UTF-8 编码版本(在转换之前可能是 UTF-16),但由于工具链中的某个错误而以 ASCII 解码形式显示。

You just need to calibrate your file/terminal/display to render text output as if it were UTF-8 (which it is).你只需要校准你的文件/终端/显示器来渲染文本 output 就好像它是 UTF-8 (它是)。


Also, remember that std::string is just a container of bytes, and does not inherently specify or imply any particular encoding.另外,请记住std::string只是一个字节容器,并没有固有地指定或暗示任何特定的编码。 So your question is rather "how can I convert UTF-16 (containing Japanese characters) into UTF-8 in Windows" or, as it turns out, "how do I configure my terminal to display UTF-8?".因此,您的问题是“如何在 Windows 中将 UTF-16(包含日文字符)转换为 UTF-8”,或者事实证明,“如何配置终端以显示 UTF-8?”。

If your display for this string is the Visual Studio locals window (which you suggest is the case with your comment "I observed the value of the "ret" string in local window while debugging" ) you are out of luck, because VS has no idea what encoding your string is in (nor does it attempt to find out).如果您对此字符串的显示是 Visual Studio locals window (您建议的情况是您的评论“我在调试时观察到本地 window 中的“ret”字符串的值” )你不走运,因为 VS 没有知道你的字符串的编码是什么(也不会试图找出)。

For other aspects of Visual Studio, though, such as the console output window, there are various approaches to work around this ( example ).但是,对于 Visual Studio 的其他方面,例如控制台 output window,有多种方法可以解决此问题(示例)。

EDIT: some things first.编辑:先做一些事情。 Windows has the notion of the ANSI codepage. Windows 具有 ANSI 代码页的概念。 It's the default codepage of non-Unicode strings that Windows assumes.这是 Windows 假定的非 Unicode 字符串的默认代码页。 Every program that uses non-Unicode versions of Windows API, and doesn't specify the codepage explicitly, uses the ANSI codepage .每个使用非 Unicode 版本的 Windows API 且未明确指定代码页的程序都使用 ANSI 代码页

The ANSI codepage is driven by the "System default locale" setting in Control Panel. ANSI 代码页由控制面板中的“系统默认区域设置”设置驱动。 As of Windows 10 May 2020, it's under Region/Administrative/Change system locale.截至 2020 年 5 月 10 日的 Windows,它位于区域/管理/更改系统语言环境下。 It takes admin rights to change that setting.更改该设置需要管理员权限。

By default, Windows with the system default locale set to English uses codepage 1252 as the ANSI codepage.默认情况下,系统默认区域设置为英语的 Windows 使用代码页 1252作为 ANSI 代码页。 That codepage doesn't contain the Japanese characters.该代码页不包含日文字符。 So using Japanese in Unicode unaware programs in that situation is hard or impossible.因此,在 Unicode 不知道的程序中使用日语是很难或不可能的。

It looks like the OP wants or has to use a piece of third part C++ code that uses multibyte strings ( std::string and/or char* ).看起来 OP 想要或必须使用使用多字节字符串( std::string和/或char* )的第三方 C++ 代码。 That doesn't necessarily mean that it's Unicode unaware, but it might.这并不一定意味着它是 Unicode 不知道,但它可能。 What the OP is trying to do entirely depends on the way that third party library is coded. OP 试图做什么完全取决于第三方库的编码方式。 It might not be possible at all.这可能根本不可能。


Looks like your problem is that some piece of third party code expects a file name in ANSI, and uses ANSI functions to open that file.看起来您的问题是某些第三方代码需要 ANSI 中的文件名,并使用 ANSI 函数打开该文件。 In an English system with the default value of the system locale, Japanese can't be converted to ANSI, because the ANSI codepage (CP1252 in practice) doesn't contain the Japanese characters.在具有系统区域设置默认值的英文系统中,日文无法转换为 ANSI,因为 ANSI 代码页(实际上是 CP1252)不包含日文字符。

What I think you should do, you should get a short file name instead using GetShortPathNameW , convert that file path to ANSI, and pass that string.我认为您应该做的是,您应该使用GetShortPathNameW获取一个短文件名,将该文件路径转换为 ANSI,然后传递该字符串。 Like this:像这样:

std::string WstringFilenameTostring(std::wstring str)
{
    wchar_t ShortPath[MAX_PATH+1];
    DWORD dw = GetShortPathNameW(str.c_str(), ShortPath, _countof(ShortPath));

    char AnsiPath[MAX_PATH+1];
    int n = WideCharToMultiByte(CP_ACP, 0, ShortPath, -1, AnsiPath, _countof(AnsiPath), 0, 0);
    return string(AnsiPath);
}

This code is for filenames only .此代码仅用于文件名 For any other Japanese string, it will return nonsense.对于任何其他日文字符串,它将返回废话。 In my test, it converted "日本語.txt" to something unreadable but alphanumeric:)在我的测试中,它将“日本语.txt”转换为不可读但字母数字的内容:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM