如何在Windows cmd上将不支持的unicode字符打印为“？”而不是引发异常？

Question

If a unicode character (code point) that is unsupported by Windows cmd, eg EN DASH "–" is printed with Python 3 in a Windows cmd terminal using: 如果Windows cmd不支持的Unicode字符（代码点）（例如EN DASH“ –” ）在Windows cmd终端中使用Python 3打印，则使用：

print('\u2013')

Then an exception is raised: 然后引发异常：

UnicodeEncodeError: 'charmap' codec can't encode character '\–' in position 0: character maps to < undefined > UnicodeEncodeError：'charmap'编解码器无法在位置0编码字符'\\ u2013'：字符映射到<undefined>

Is there a way to make print convert unsupported characters to eg "?", or otherwise handle the print to allow execution to continue ? 有没有一种方法可以使print将不支持的字符转换为“？”，或者以其他方式处理print以允许执行继续？

Answer 1

Update 更新资料

There is a better way... see below. 有更好的方法...请参阅下文。

There must be a better way, but this is all I can think of at the moment: 必须有一个更好的方法，但这就是我目前能想到的：

print('\u2013'.encode(errors='replace').decode())

This uses encode() to encode the unicode string to whatever your default encoding is, "replacing" characters that are not valid for that encoding with ? 它使用encode()将unicode字符串编码为您的默认编码，用?替换对于该编码无效的字符? . 。 That converts the string to a bytes string, so that is then converted back to unicode, preserving the replaced characters. 将字符串转换为bytes字符串，然后将其转换回Unicode，并保留替换的字符。

Here is an example using a code point that is not valid in GBK encoding: 这是使用在GBK编码中无效的代码点的示例：

>>> s = 'abc\u3020def'
>>> print(s)
s.abc〠def
>>> s.encode(encoding='gbk')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3020' in position 3: illegal multibyte sequence

>>> s.encode(encoding='gbk', errors='replace')
b'abc?def'
>>> s.encode(encoding='gbk', errors='replace').decode()
'abc?def'

>>> print(s.encode(encoding='gbk', errors='replace').decode())
abc?def

Update 更新资料

So there is a better way as mentioned by @eryksun in comments. 因此，@ eryksun在评论中提到了一种更好的方法。 Once set up there is no need to change any code to effect unsupported character replacement. 设置完成后，无需更改任何代码即可实现不受支持的字符替换。 The code below demonstrates before and after behaviour (I have set my preferred encoding to GBK): 下面的代码演示了行为之前和之后（我将首选编码设置为GBK）：

>>> import os, sys
>>> print('\u3030')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character '\u3030' in position 0: illegal multibyte sequence

>>> old_stdout = sys.stdout
>>> fd = os.dup(sys.stdout.fileno())
>>> sys.stdout = open(fd, mode='w', errors='replace')
>>> old_stdout.close()

>>> print('\u3030')
?

Answer 2

@eryksun comment mentions assigning Windows environment variable: @eryksun评论提到分配Windows环境变量：

PYTHONIOENCODING=:replace

Note the ":" before "replace". 注意“替换”之前的“：”。 This looks like a usable answer that does not require any changes in Python scripts using print . 这看起来像是一个有用的答案，不需要使用print在Python脚本中进行任何更改。

The print('\–') results in: print('\–')结果为：

? ？

and print('Hello\–world!') results in: 和print('Hello\–world!')结果为：

Hello?world! 你好，世界！

如何在Windows cmd上将不支持的unicode字符打印为“？”而不是引发异常？

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-03-08 09:56:41

解决方案2
1 2016-03-08 10:24:16

如何在Windows cmd上将不支持的unicode字符打印为“？”而不是引发异常？

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-03-08 09:56:41

解决方案2 1 2016-03-08 10:24:16

解决方案1
4 已采纳 2016-03-08 09:56:41

解决方案2
1 2016-03-08 10:24:16