Windows C ++中的输入编码问题

Question

I am developing a simple console application with Visual Studio 2013 我正在使用Visual Studio 2013开发一个简单的console应用程序

int _tmain(int argc, _TCHAR* argv[])
{    
    std::wstring name;
    std::wcout << L"Enter your name: ";
    std::wcin >> name;
    std::wcout << L"Hello, " << name << std::endl;
    system("pause");
    return 0;
}

If I enter as input Ángel the application works well and the output is 如果我输入Ángel ，应用程序运行良好，输出为

Hello, Ángel

the problem is that If i put a breakpoint on 问题是如果我把断点放在上面

std::wcout << L"Hello, " << name << std::endl;

the Visual studio debugger shows Visual Studio调试器显示

+       name    L"µngel"    std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >

Although the output in console is correct in other part of the program I have a call to win32api function CopyFileW() and it always fails because the path has the substring Ángel and the substring passed to function is transformed to µngel 虽然控制台中的输出在程序的其他部分是正确的，但我调用win32api函数CopyFileW()并且它总是失败，因为路径有子串 Ángel并且传递给function的子串被转换为µngel

Answer 1

The problem is that Windows consoles are broken by default. 问题是Windows控制台默认是破坏的。

The problem arises from Windows using a different 8-bit codepage in console application than in Windows applications. 问题出在Windows在控制台应用程序中使用与Windows应用程序不同的8位代码页。 By default, in Western Windows versions, the default 8-bit codepage (called ANSI) is Windows-1252, while the console 8-bit codepage (called OEM) is CP850. 默认情况下，在西部Windows版本中，默认的8位代码页（称为ANSI）是Windows-1252，而控制台的8位代码页（称为OEM）是CP850。

Since your program doesn't know if it is reading from console or from a redirected file, it simply assumes ANSI input. 由于您的程序不知道它是从控制台读取还是从重定向文件读取，它只是假设ANSI输入。 But when you type Á , it is actually the codepoint from CP850 : 0xB5 . 但是当你输入Á ，它实际上是CP850的代码点： 0xB5 。 It is then interpreted using Windows-1252 as µ , that is Unicode characters U+00B5. 然后使用Windows-1252将其解释为µ ，即Unicode字符U + 00B5。 The funny thing is that when you print it into the console, the inverse transformation happens, and you see a Á again. 有趣的是，当你将它打印到控制台时，会发生逆变换，你再次看到一个Á 。 Two wrongs make one right! 两个错误使一个正确！

But when you want to use that characters in a non-console context, it is actually a µ . 但是当你想在非控制台环境中使用那些字符时，它实际上是µ 。

You may think that you can convert from OEM to ANSI and then from ANSI to Unicode, and that would seem to work... until you run your program as: 您可能认为您可以从OEM转换为ANSI，然后从ANSI转换为Unicode，这似乎有效...直到您将程序运行为：

c:\> myprogram < input.txt

And you wrote that input.txt using notepad, so it is using ANSI, and then you are doing a conversion you do not need. 并且您使用记事本编写了input.txt ，因此它使用ANSI，然后您正在进行不需要的转换。

You say then that you could detect if you are reading the actual console or a redirection and do the OEM to ANSI conversion only when there is no redirect... until you do: 然后你说你可以检测你是在阅读实际的控制台还是重定向，只有在没有重定向时才进行OEM到ANSI的转换...直到你这样做：

c:\> echo Ángel | myprogram

And you are doing it wrong again! 而你又错了！

There are a lot of alternatives, but none of them works completely fine. 有很多替代方案，但它们都没有完全正常。 At least you should use a Unicode font and then a more normal codepage. 至少你应该使用Unicode字体，然后使用更普通的代码页。 Something like chcp 1252 to change the OEM codepage to match the ANSI one. 像chcp 1252这样的东西来改变OEM代码页以匹配ANSI代码页。 You can even configure it by default with a bit of registry foo: 您甚至可以使用一些注册表foo来默认配置它：

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP=1252

Windows C ++中的输入编码问题

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-05-06 18:36:42

Windows C ++中的输入编码问题

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-05-06 18:36:42

解决方案1
3 已采纳 2016-05-06 18:36:42