[英]How to print unsupported unicode characters on Windows cmd as e.g. “?” instead of raising exception?
If a unicode character (code point) that is unsupported by Windows cmd, eg EN DASH "–" is printed with Python 3 in a Windows cmd terminal using: 如果Windows cmd不支持的Unicode字符(代码点)(例如EN DASH“ –” )在Windows cmd终端中使用Python 3打印,则使用:
print('\u2013')
Then an exception is raised: 然后引发异常:
UnicodeEncodeError: 'charmap' codec can't encode character '\–' in position 0: character maps to < undefined > UnicodeEncodeError:'charmap'编解码器无法在位置0编码字符'\\ u2013':字符映射到<undefined>
Is there a way to make print
convert unsupported characters to eg "?", or otherwise handle the print
to allow execution to continue ? 有没有一种方法可以使print
将不支持的字符转换为“?”,或者以其他方式处理print
以允许执行继续?
Update 更新资料
There is a better way... see below. 有更好的方法...请参阅下文。
There must be a better way, but this is all I can think of at the moment: 必须有一个更好的方法,但这就是我目前能想到的:
print('\u2013'.encode(errors='replace').decode())
This uses encode()
to encode the unicode string to whatever your default encoding is, "replacing" characters that are not valid for that encoding with ?
它使用encode()
将unicode字符串编码为您的默认编码,用?
替换对于该编码无效的字符?
. 。 That converts the string to a bytes
string, so that is then converted back to unicode, preserving the replaced characters. 将字符串转换为bytes
字符串,然后将其转换回Unicode,并保留替换的字符。
Here is an example using a code point that is not valid in GBK encoding: 这是使用在GBK编码中无效的代码点的示例:
>>> s = 'abc\u3020def'
>>> print(s)
s.abc〠def
>>> s.encode(encoding='gbk')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3020' in position 3: illegal multibyte sequence
>>> s.encode(encoding='gbk', errors='replace')
b'abc?def'
>>> s.encode(encoding='gbk', errors='replace').decode()
'abc?def'
>>> print(s.encode(encoding='gbk', errors='replace').decode())
abc?def
Update 更新资料
So there is a better way as mentioned by @eryksun in comments. 因此,@ eryksun在评论中提到了一种更好的方法。 Once set up there is no need to change any code to effect unsupported character replacement. 设置完成后,无需更改任何代码即可实现不受支持的字符替换。 The code below demonstrates before and after behaviour (I have set my preferred encoding to GBK): 下面的代码演示了行为之前和之后(我将首选编码设置为GBK):
>>> import os, sys
>>> print('\u3030')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3030' in position 0: illegal multibyte sequence
>>> old_stdout = sys.stdout
>>> fd = os.dup(sys.stdout.fileno())
>>> sys.stdout = open(fd, mode='w', errors='replace')
>>> old_stdout.close()
>>> print('\u3030')
?
@eryksun comment mentions assigning Windows environment variable: @eryksun评论提到分配Windows环境变量:
PYTHONIOENCODING=:replace
Note the ":" before "replace". 注意“替换”之前的“:”。 This looks like a usable answer that does not require any changes in Python scripts using print
. 这看起来像是一个有用的答案,不需要使用print
在Python脚本中进行任何更改。
The print('\–')
results in: print('\–')
结果为:
? ?
and print('Hello\–world!')
results in: 和print('Hello\–world!')
结果为:
Hello?world! 你好,世界!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.