简体   繁体   English

如何在输出中获得Cyrillic,Python?

[英]How do I get Cyrillic in the output, Python?

how do I get Cyrillic instead of u'... 我如何得到西里尔而不是u'...

the code is like this 代码是这样的

def openfile(filename):
    with codecs.open(filename, encoding="utf-8") as F:
        raw = F.read()
do stuff...
print some_text

prints 版画

>>>[u'.', u',', u':', u'\в', u'<', u'>', u'(', u')', u'\з', u'\і']

It looks like some_text is a list of unicode objects. 看起来some_text是unicode对象的列表。 When you print such a list, it prints the reprs of the elements inside the list. 当您打印这样的列表时,它会打印列表中元素的reprs So instead try: 所以请尝试:

print(u''.join(some_text))

The join method concatenates the elements of some_text , with an empty space, u'' , in between the elements. join方法连接some_text的元素,在元素some_text有一个空的空格u'' The result is one unicode object. 结果是一个unicode对象。

It's not clear to me where some_text comes from (you cut out that bit of your code), so I have no idea why it prints as a list of characters rather than a string. 我不清楚some_text来自哪里(你some_text了你的代码),所以我不知道它为什么打印成字符列表而不是字符串。

But you should be aware that by default, Python tries to encode strings as ASCII when you print them to the terminal. 但是你应该知道,默认情况下,当你将字符串打印到终端时,Python会尝试将字符串编码为ASCII。 If you want them to be encoded in some other coding system, you can do that explicitly: 如果您希望它们在其他编码系统中进行编码,您可以明确地执行此操作:

>>> text = u'\u0410\u0430\u0411\u0431'
>>> print text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
  ordinal not in range(128)
>>> print text.encode('utf8')
АаБб

u'\\uNNNN' is the ASCII-safe version of the string literal u'з' : u'\\uNNNN'是字符串文字u'з'的ASCII安全版本:

>>> print u'\u0437'
з

However this will only display right for you if your console supports the character you are trying to print. 但是,如果您的控制台支持您要打印的角色,则此选项仅适用于您。 Trying the above on the console on a Western European Windows install fails: 在西欧Windows安装的控制台上尝试上述操作失败:

>>> print u'\u0437'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0437' in position 0: character maps to <undefined>

Because getting the Windows console to output Unicode is tricky, Python 2's repr function always opts for the ASCII-safe literal version. 因为让Windows控制台输出Unicode很棘手,所以Python 2的repr函数总是选择ASCII安全文字版本。

Your print statement is outputting the repr version and not printing characters directly because you've got them inside a list of characters instead of a string. 您的print语句正在输出repr版本而不是直接打印字符,因为您已将它们放在字符列表中而不是字符串中。 If you did print on each of the members of the list, you'd get the characters output directly and not represented as u'...' string literals. 如果你在列表的每个成员上print ,你将直接获得字符输出,而不是表示为u'...'字符串文字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM