简体   繁体   English

有关python编码的问题

[英]problems about python's encode

First, I test the default encode in command line, 首先,我在命令行中测试默认编码,

>>>import sys
>>>print sys.getdefaultencoding()
'ascii'

Then, I assigned a chinese character to a variable, 然后,我给变量分配了一个汉字,

>>>s="汉"
>>>print s
汉

So,my question is that why ascii can show chinese character? 因此,我的问题是,ASCII为什么可以显示汉字?

The default encoding doesn't apply here; 默认编码在这里不适用; it is only used when implicitly converting between Unicode and bytestring values. 仅当在Unicode和字节字符串值之间隐式转换时才使用它。

You created a bytestring in your terminal . 在终端中创建了一个字节串。 Your terminal encoded the character, and you stored the bytes. 终端对字符进行了编码,然后存储了字节。 Printing the bytes causes the terminal to decode the bytes again. 打印字节会导致终端再次解码字节。

For example, if your terminal is configured to use UTF-8, you'll see this when echoing s : 例如,如果您的终端配置为使用UTF-8,则在回显s时会看到以下内容:

>>> s = "汉"
>>> s
'\xe6\xb1\x89'

Those are 3 UTF-8 bytes, and printing those back to the terminal results in data that the terminal knows how to decode again: 它们是3个UTF-8字节,将它们打印回终端会得到终端知道如何再次解码的数据:

>>> print s
汉

Note that in a terminal environment, the interactive prompt makes use of the terminal encoding it detected to decode input when creating Unicode objects: 请注意,在终端环境中,交互式提示在创建Unicode对象时利用它检测到的终端编码来解码输入:

>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> unicode_string = u"汉"
>>> unicode_string
u'\u6c49'
>>> print unicode_string
汉

Printing automatically encodes the Unicode object again to match the terminal encoding. 打印将再次自动对Unicode对象进行编码以匹配终端编码。 This is in contrast to string literals in Python source code in a .py file, where you have to declare the file codec using a PEP 263 header . 这与.py文件中Python源代码中的字符串文字相反,在该文件中,您必须使用PEP 263标头声明文件编解码器。

Last but not least, sys.getdefaultencoding() is used for implicit conversions; 最后但并非最sys.getdefaultencoding()一点是, sys.getdefaultencoding()用于隐式转换。 when concatenating a byte string with a Unicode value, for example: 将字节字符串与Unicode值连接时,例如:

>>> unicode_string + s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM