简体   繁体   English

Windows控制台编码

[英]Windows console encoding

What is the default console encoding on Windows? Windows上默认的控制台编码是什么? It seems like sometimes it is the ANSI encoding ( CP-1252 ), sometimes it is the OEM encoding ( CP-850 for Western Europe by default) given by the chcp command. 似乎有时是chcp命令给定的ANSI编码CP-1252 ),有时是OEM编码 (默认是西欧的CP-850 )。

  • Command-line arguments and environment variables trigger the ANSI encoding ( é = 0xe9): 命令行参数环境变量触发ANSI编码( é = 0xe9):

     > chcp 850 Active code page: 850 > python -c "print 'é'" Ú > python -c "print '\\x82'" é > python -c "print '\\xe9'" Ú > $env:foobar="é"; python -c "import os; print os.getenv('foobar')" Ú > chcp 1252 Active code page: 1252 > python -c "print 'é'" é > python -c "print '\\x82'" , > python -c "print '\\xe9'" é > $env:foobar="é"; python -c "import os; print os.getenv('foobar')" é 
  • Python console and standard input trigger the OEM encoding ( é = 0x82 if the OEM encoding is CP-850, é = 0xe9 if the OEM encoding is CP-1252): 蟒控制台标准输入触发OEM编码( é =为0x82如果OEM编码是CP-850, é = 0xe9如果OEM编码是CP-1252):

     > chcp 850 Active code page: 850 > python >>> print 'é' é >>> print '\\x82' é >>> print '\\xe9' Ú > python -c "print raw_input()" é é > chcp 1252 Active code page: 1252 > python >>> print 'é' é >>> print '\\x82' , >>> print '\\xe9' é > python -c "print raw_input()" é é 

Note. 注意。 – In these examples, I used Powershell 5.1 and CPython 2.7.14 on Windows 10. –在这些示例中,我在Windows 10上使用了Powershell 5.1和CPython 2.7.14。

First of all, for all your non-ASCII characters, what matters here is your console encoding and Windows locale settings, you are using byte strings and Python just prints out the bytes it received. 首先,对于所有非ASCII字符,重要的是控制台编码和Windows区域设置,您使用的是字节字符串,而Python只是打印出接收到的字节。 Your keyboard input is encoded to a specific byte or byte sequence by the console before those bytes are passed on to Python. 在将键盘输入传递给Python之前,控制台会将您的键盘输入编码为特定的字节或字节序列。 To Python, this is all just opaque data (numbers in the range 0-255), and print passes those back to the console the same way Python received them. 对于Python,这都是不透明的数据(数字范围为0-255),并且print将这些数据以Python接收它们的方式传递回控制台。

In Powershell, what encoding is used for the bytes sent to Python via command-line switches is not determined by the chcp codepage, but by the Language for non-Unicode programs setting in your control panel (search for Region , then find the Administrative tab). 在Powershell中,通过命令行开关发送到Python的字节所使用的编码方式不是由chcp代码页确定,而是由控制面板中的“ 非Unicode程序语言”设置确定(搜索Region ,然后找到Administrative标签) )。 It is this setting that encodes é to 0xE9 before passing it to Python as a command-line argument. 此设置é编码为0xE9,然后将其作为命令行参数传递给Python。 There are a large number of Windows codepages that use 0xE9 for é (but there is no such thing as an ANSI encoding ). 许多Windows代码页将0xE9用于é (但没有ANSI编码之类的东西 )。

The same applies to environment variables. 环境变量也是如此。 Python refers to the encoding Windows uses here as the MBCS codec ; Python将Windows用作MBCS编解码器的编码 ; you can decode command-line parameters or environment variables to Unicode using the 'mbcs' codec, which uses the MultiByteToWideChar() and WideCharToMultiByte() Windows API functions, with the CP_ACP flag. 您可以使用'mbcs'编解码器将命令行参数或环境变量解码为Unicode,该编解码器使用带有CP_ACP标志的MultiByteToWideChar()WideCharToMultiByte() Windows API函数。

When using the interactive prompt, Python is passed bytes as encoded by the Powershell console locale codepage, set with chcp . 使用交互式提示时,Python传递的字节由Powershell控制台语言环境代码页(由chcp设置)编码。 For you that's codepage 850, and a byte with the hex value 0x82 is received when you type é . 对于您来说,这是850页的代码,当您输入é时,会收到一个十六进制值为0x82的字节。 Because print sends the same 0x82 byte back to the same console, the console then translates 0x82 back to a é character on the screen. 因为print将相同的0x82字节发送回同一控制台,所以控制台随后将0x82转换回屏幕上的é字符。

Only when you use Unicode text (with a unicode string literal like u'é' ) would Python do any decoding and encoding of the data. 只有当您使用Unicode文本 (如u'é'这样的unicode字符串文字)时,Python才会对数据进行任何解码和编码。 print writes to sys.stdout , which is configured to encode Unicode data to the current locale (or PYTHONIOENCODING if set), so print u'é' would write that Unicode object to sys.stdout , which then encodes that object to bytes using the configured codec, and those bytes are then written to the console. print写入sys.stdout ,后者被配置为将Unicode数据编码为当前语言环境(或设置为PYTHONIOENCODING ),因此print u'é'会将Unicode对象写入sys.stdout ,然后使用sys.stdout将对象编码为字节配置的编解码器,然后将这些字节写入控制台。

To produce the unicode object from the u'é' source code text (itself a sequence of bytes), Python does have to decode the source code given. 为了从u'é'源代码文本(本身就是字节序列)生成unicode对象,Python确实必须解码给定的源代码。 For the -c command line, the bytes that are passed in are decoded as Latin-1 . 对于-c命令行,传入的字节被解码为Latin-1 In the interactive console, the locale is used. 在交互式控制台中,使用区域设置。 So python -c "print u'é'" and print u'é' in the interactive session will result in different output. 因此,在交互式会话中, python -c "print u'é'"print u'é'将导致不同的输出。

It should be noted that Python 3 uses Unicode strings throughout, and command-line parameters and environment variables are loaded into Python with the Windows 'wide' APIs to access the data as UTF-16, then presented as Unicode string objects. 应当指出,Python 3始终使用Unicode字符串,并且命令行参数和环境变量通过Windows的“宽” API加载到Python中以UTF-16的形式访问数据,然后以Unicode字符串对象的形式呈现。 You can still access console data and filesystem information as byte strings, but as of Python 3.6, accessing the filesystem and stdin/stdout/stderr streams as binary uses UTF-8 encoded data (again using the 'wide' APIs). 您仍然可以以字节字符串的形式访问控制台数据和文件系统信息,但是从Python 3.6开始, 以二进制格式访问文件系统和stdin / stdout / stderr流使用UTF-8编码的数据 (再次使用“宽” API)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM