If a unicode character (code point) that is unsupported by Windows cmd, eg EN DASH "–" is printed with Python 3 in a Windows cmd terminal using:
print('\u2013')
Then an exception is raised:
UnicodeEncodeError: 'charmap' codec can't encode character '\–' in position 0: character maps to < undefined >
Is there a way to make print
convert unsupported characters to eg "?", or otherwise handle the print
to allow execution to continue ?
Update
There is a better way... see below.
There must be a better way, but this is all I can think of at the moment:
print('\u2013'.encode(errors='replace').decode())
This uses encode()
to encode the unicode string to whatever your default encoding is, "replacing" characters that are not valid for that encoding with ?
. That converts the string to a bytes
string, so that is then converted back to unicode, preserving the replaced characters.
Here is an example using a code point that is not valid in GBK encoding:
>>> s = 'abc\u3020def'
>>> print(s)
s.abc〠def
>>> s.encode(encoding='gbk')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3020' in position 3: illegal multibyte sequence
>>> s.encode(encoding='gbk', errors='replace')
b'abc?def'
>>> s.encode(encoding='gbk', errors='replace').decode()
'abc?def'
>>> print(s.encode(encoding='gbk', errors='replace').decode())
abc?def
Update
So there is a better way as mentioned by @eryksun in comments. Once set up there is no need to change any code to effect unsupported character replacement. The code below demonstrates before and after behaviour (I have set my preferred encoding to GBK):
>>> import os, sys
>>> print('\u3030')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3030' in position 0: illegal multibyte sequence
>>> old_stdout = sys.stdout
>>> fd = os.dup(sys.stdout.fileno())
>>> sys.stdout = open(fd, mode='w', errors='replace')
>>> old_stdout.close()
>>> print('\u3030')
?
@eryksun comment mentions assigning Windows environment variable:
PYTHONIOENCODING=:replace
Note the ":" before "replace". This looks like a usable answer that does not require any changes in Python scripts using print
.
The print('\–')
results in:
?
and print('Hello\–world!')
results in:
Hello?world!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.