简体   繁体   中英

Why does Python 3 output \xe3, an extra char?

Why does Python add \\xe3 in the output of:

>>> b'Transa\xc3\xa7\xc3\xa3o'.decode('utf-8')
'Transaç\xe3o'

Expected value is:

'Transação'

Some more information about my environment:

>>> import sys
>>> print (sys.version)
3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)]   
>>> sys.stdout.encoding
'cp437'

This was under Console 2 + Powershell.

You need to use a console or terminal that supports all of the characters that you want to print.

When printing in the interactive console, the characters are encoded to the correct codec for your console, with any character that is not supported using the backslashreplace error handler to keep the output readable rather than throw an exception. This is a feature of the default sys.displayhook() function :

If repr(value) is not encodable to sys.stdout.encoding with sys.stdout.errors error handler (which is probably 'strict' ), encode it to sys.stdout.encoding with 'backslashreplace' error handler.

Your console can handle ç but not ã . There are several codecs that include the first character but not the last; you are using IBM codepage 437 , but it is by no means the only one.

If you are running Python in the standard Windows console ( cmd.exe ) then be aware that Python, Unicode and that console do not mix very well. You can install the win-unicode-console package to make Python 3 use the Windows APIs to better output Unicode text; you'll need to make sure you have a font capable of displaying your Unicode text still.

I don't know for certain if that package is compatible with other Windows shells; your mileage may vary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM