简体   繁体   中英

How to print unsupported unicode characters on Windows cmd as e.g. “?” instead of raising exception?

If a unicode character (code point) that is unsupported by Windows cmd, eg EN DASH "–" is printed with Python 3 in a Windows cmd terminal using:

print('\u2013')

Then an exception is raised:

UnicodeEncodeError: 'charmap' codec can't encode character '\–' in position 0: character maps to < undefined >

Is there a way to make print convert unsupported characters to eg "?", or otherwise handle the print to allow execution to continue ?

Update

There is a better way... see below.


There must be a better way, but this is all I can think of at the moment:

print('\u2013'.encode(errors='replace').decode())

This uses encode() to encode the unicode string to whatever your default encoding is, "replacing" characters that are not valid for that encoding with ? . That converts the string to a bytes string, so that is then converted back to unicode, preserving the replaced characters.

Here is an example using a code point that is not valid in GBK encoding:

>>> s = 'abc\u3020def'
>>> print(s)
s.abc〠def
>>> s.encode(encoding='gbk')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3020' in position 3: illegal multibyte sequence

>>> s.encode(encoding='gbk', errors='replace')
b'abc?def'
>>> s.encode(encoding='gbk', errors='replace').decode()
'abc?def'

>>> print(s.encode(encoding='gbk', errors='replace').decode())
abc?def

Update

So there is a better way as mentioned by @eryksun in comments. Once set up there is no need to change any code to effect unsupported character replacement. The code below demonstrates before and after behaviour (I have set my preferred encoding to GBK):

>>> import os, sys
>>> print('\u3030')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3030' in position 0: illegal multibyte sequence

>>> old_stdout = sys.stdout
>>> fd = os.dup(sys.stdout.fileno())
>>> sys.stdout = open(fd, mode='w', errors='replace')
>>> old_stdout.close()

>>> print('\u3030')
?

@eryksun comment mentions assigning Windows environment variable:

PYTHONIOENCODING=:replace

Note the ":" before "replace". This looks like a usable answer that does not require any changes in Python scripts using print .

The print('\–') results in:

?

and print('Hello\–world!') results in:

Hello?world!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM