[英]How do I get Cyrillic in the output, Python?
how do I get Cyrillic instead of u'...
我如何得到西里尔而不是
u'...
the code is like this 代码是这样的
def openfile(filename):
with codecs.open(filename, encoding="utf-8") as F:
raw = F.read()
do stuff...
print some_text
prints 版画
>>>[u'.', u',', u':', u'\в', u'<', u'>', u'(', u')', u'\з', u'\і']
It looks like some_text
is a list of unicode objects. 看起来
some_text
是unicode对象的列表。 When you print such a list, it prints the reprs
of the elements inside the list. 当您打印这样的列表时,它会打印列表中元素的
reprs
。 So instead try: 所以请尝试:
print(u''.join(some_text))
The join method concatenates the elements of some_text
, with an empty space, u''
, in between the elements. join方法连接
some_text
的元素,在元素some_text
有一个空的空格u''
。 The result is one unicode object. 结果是一个unicode对象。
It's not clear to me where some_text
comes from (you cut out that bit of your code), so I have no idea why it prints as a list of characters rather than a string. 我不清楚
some_text
来自哪里(你some_text
了你的代码),所以我不知道它为什么打印成字符列表而不是字符串。
But you should be aware that by default, Python tries to encode strings as ASCII when you print them to the terminal. 但是你应该知道,默认情况下,当你将字符串打印到终端时,Python会尝试将字符串编码为ASCII。 If you want them to be encoded in some other coding system, you can do that explicitly:
如果您希望它们在其他编码系统中进行编码,您可以明确地执行此操作:
>>> text = u'\u0410\u0430\u0411\u0431'
>>> print text
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
ordinal not in range(128)
>>> print text.encode('utf8')
АаБб
u'\\uNNNN'
is the ASCII-safe version of the string literal u'з'
: u'\\uNNNN'
是字符串文字u'з'
的ASCII安全版本:
>>> print u'\u0437'
з
However this will only display right for you if your console supports the character you are trying to print. 但是,如果您的控制台支持您要打印的角色,则此选项仅适用于您。 Trying the above on the console on a Western European Windows install fails:
在西欧Windows安装的控制台上尝试上述操作失败:
>>> print u'\u0437'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0437' in position 0: character maps to <undefined>
Because getting the Windows console to output Unicode is tricky, Python 2's repr
function always opts for the ASCII-safe literal version. 因为让Windows控制台输出Unicode很棘手,所以Python 2的
repr
函数总是选择ASCII安全文字版本。
Your print
statement is outputting the repr
version and not printing characters directly because you've got them inside a list of characters instead of a string. 您的
print
语句正在输出repr
版本而不是直接打印字符,因为您已将它们放在字符列表中而不是字符串中。 If you did print
on each of the members of the list, you'd get the characters output directly and not represented as u'...'
string literals. 如果你在列表的每个成员上
print
,你将直接获得字符输出,而不是表示为u'...'
字符串文字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.