简体   繁体   中英

problems about python's encode

First, I test the default encode in command line,

>>>import sys
>>>print sys.getdefaultencoding()
'ascii'

Then, I assigned a chinese character to a variable,

>>>s="汉"
>>>print s
汉

So,my question is that why ascii can show chinese character?

The default encoding doesn't apply here; it is only used when implicitly converting between Unicode and bytestring values.

You created a bytestring in your terminal . Your terminal encoded the character, and you stored the bytes. Printing the bytes causes the terminal to decode the bytes again.

For example, if your terminal is configured to use UTF-8, you'll see this when echoing s :

>>> s = "汉"
>>> s
'\xe6\xb1\x89'

Those are 3 UTF-8 bytes, and printing those back to the terminal results in data that the terminal knows how to decode again:

>>> print s
汉

Note that in a terminal environment, the interactive prompt makes use of the terminal encoding it detected to decode input when creating Unicode objects:

>>> import sys
>>> sys.stdin.encoding
'UTF-8'
>>> unicode_string = u"汉"
>>> unicode_string
u'\u6c49'
>>> print unicode_string
汉

Printing automatically encodes the Unicode object again to match the terminal encoding. This is in contrast to string literals in Python source code in a .py file, where you have to declare the file codec using a PEP 263 header .

Last but not least, sys.getdefaultencoding() is used for implicit conversions; when concatenating a byte string with a Unicode value, for example:

>>> unicode_string + s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM