简体   繁体   English

UTF-8和彩色打印问题

[英]UTF-8 and colour printing woes…

I have a console program that outputs in wonderful colour. 我有一个控制台程序,可以输出美妙的色彩。 For errors, the following code is used with some trivial examples at the bottom. 对于错误,下面的代码与底部的一些简单示例结合使用。

# coding: utf-8
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from sys import stderr
from colored import fg
from colored import attr
from locale import getpreferredencoding

def format_error(x):
    return '{0}{1}{2}'.format(fg(88), x, attr('reset'))

def print_error(x):
    msg = format_error('✗  {0}\n'.format(x))
    stderr.write(msg.encode(getpreferredencoding()))

print_error(str('ook'))
print_error(unicode(b'café', 'UTF-8'))

I have no control over that x is. 我无法控制x It could be anything. 可能是任何东西。 Also, some of this script is called from a GUI that captures stdout / stderr via glib-spawn-async . 另外,此脚本中的某些脚本是通过GUI调用的,该GUI通过glib-spawn-async捕获stdout / stderr As such, from time to time, I get UnicodeDecodeError errors. 因此,我有时会收到UnicodeDecodeError错误。 I have read the Unicode HOWTo but clearly I am missing something. 我已经阅读了Unicode HOWTo,但显然我缺少一些东西。

How can I harden my code such that UnicodeDecodeError are never raised? 我如何加强我的代码,使之永远不会引发UnicodeDecodeError

For example, within a gtk.textview , I get the following whereas on the console, all is fine. 例如,在gtk.textview ,我得到了以下内容,而在控制台上,一切都很好。 Trace has been cut to remove irrelevant data. 跟踪已被删除,以删除无关的数据。

 File "/home/usr/nifty_logger.py", line 96, in print_success
    sys.stdout.write(msg.encode(getpreferredencoding()))
  File "/home/usr/.virtualenvs/rprs_bootstrap/lib64/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

The encode() takes an optional argument defining the error handling: encode()采用一个可选的参数定义错误处理:

str.encode([encoding[, errors]])

From the docs: 从文档:

Return an encoded version of the string. 返回字符串的编码版本。 Default encoding is the current default string encoding. 默认编码是当前的默认字符串编码。 errors may be given to set a different error handling scheme. 可以设置错误以设置不同的错误处理方案。 The default for errors is 'strict', meaning that encoding errors raise a UnicodeError. 错误的默认值为“严格”,这意味着编码错误会引发UnicodeError。 Other possible values are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' and any other name registered via codecs.register_error(), see section Codec Base Classes. 其他可能的值是'ignore','replace','xmlcharrefreplace','backslashreplace'以及通过codecs.register_error()注册的任何其他名称,请参见编解码器基类。 For a list of possible encodings, see section Standard Encodings. 有关可能的编码的列表,请参见“标准编码”部分。

In your case: 在您的情况下:

msg.encode(getpreferredencoding(), 'backslashreplace')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM