简体   繁体   English

如何以原始语言打印unicode字符串的元组(不是u'foo'形式)

[英]How to print tuples of unicode strings in original language (not u'foo' form)

I have a list of tuples of unicode objects: 我有一个unicode对象元组列表:

>>> t = [('亀',), ('犬',)]

Printing this out, I get: 打印出来,我得到:

>>> print t
[('\xe4\xba\x80',), ('\xe7\x8a\xac',)]

which I guess is a list of the utf-8 byte-code representation of those strings? 我想这是这些字符串的utf-8字节码表示的列表?

but what I want to see printed out is, surprise: 但我想看到的是打印出来的,惊喜:

[('亀',), ('犬',)]

but I'm having an inordinate amount of trouble getting the bytecode back into a human-readable form. 但是我在将字节码重新变成人类可读的形式方面遇到了过多的麻烦。

but what I want to see printed out is, surprise: 但我想看到的是打印出来的,惊喜:

[('亀',), ('犬',)] [('亀',),('犬',)]

What do you want to see it printed out on? 你想看到它打印出来的是什么? Because if it's the console, it's not at all guaranteed your console can display those characters. 因为如果它是控制台,它根本不能保证您的控制台可以显示这些字符。 This is why Python's 'repr()' representation of objects goes for the safe option of \\-escapes, which you will always be able to see on-screen and type in easily. 这就是为什么Python的'repr()'对象表示用于\\ -escapes的安全选项,你将始终能够在屏幕上看到并轻松输入。

As a prerequisite you should be using Unicode strings (u''). 作为先决条件,您应该使用Unicode字符串(u'')。 And, as mentioned by Matthew, if you want to be able to write u'亀' directly in source you need to make sure Python can read the file's encoding. 而且,正如Matthew所提到的,如果你想能够直接在源代码中编写u'亀',你需要确保Python可以读取文件的编码。 For occasional use of non-ASCII characters it is best to stick with the escaped version u'\亀', but when you have a lot of East Asian text you want to be able to read, “# coding=utf-8” is definitely the way to go. 对于偶尔使用非ASCII字符,最好坚持使用转义版本u'\\ u4e80',但是当你想要阅读很多东亚文本时,“#coding = utf-8”是绝对是要走的路。

print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)]) print'[%s]'%','。join([','。join('(%s,)'%','。join(ti)for ti in t)])

That would print the characters unwrapped by quotes. 这将打印用引号打开的字符。 Really you'd want: 你真的想要:

def reprunicode(u):
    return repr(u).decode('raw_unicode_escape')

print u'[%s]' % u', '.join([u'(%s,)' % reprunicode(ti[0]) for ti in t])

This would work, but if the console didn't support Unicode (and this is especially troublesome on Windows), you'll get a big old UnicodeError. 这可行,但如果控制台不支持Unicode(这在Windows上特别麻烦),你会得到一个很大的旧UnicodeError。

In any case, this rarely matters because the repr() of an object, which is what you're seeing here, doesn't usually make it to the public user interface of an application; 在任何情况下,这很少重要,因为您在这里看到的对象的repr()通常不会进入应用程序的公共用户界面; it's really for the coder only. 它只适用于编码器。

However, you'll be pleased to know that Python 3.0 behaves exactly as you want: 但是,您会很高兴知道Python 3.0的行为完全符合您的要求:

  • plain '' strings without the 'u' prefix are now Unicode strings 没有'u'前缀的普通字符串现在是Unicode字符串
  • repr() shows most Unicode characters verbatim repr()逐字显示大多数Unicode字符
  • Unicode in the Windows console is better supported (you can still get UnicodeError on Unix if your environment isn't UTF-8) 更好地支持Windows控制台中的Unicode(如果您的环境不是UTF-8,您仍然可以在Unix上获得UnicodeError)

Python 3.0 is a little bit new and not so well-supported by libraries, but it might well suit your needs better. Python 3.0有点新,而且库不太受支持,但它可能更适合您的需求。

First, there's a slight misunderstanding in your post. 首先,你的帖子中存在轻微的误解。 If you define a list like this: 如果您定义这样的列表:

>>> t = [('亀',), ('犬',)]

...those are not unicode s you define, but str s. ......那些不是你定义的unicode ,而是str If you want to have unicode types, you have to add a u before the character: 如果你想拥有unicode类型,你必须在角色之前添加一个u

>>> t = [(u'亀',), (u'犬',)]

But let's assume you actually want str s, not unicode s. 但是我们假设你真的想要str ,而不是unicode The main problem is, __str__ method of a list (or a tuple) is practically equal to its __repr__ method (which returns a string that, when evaluated, would create exactly the same object). 主要的问题是, __str__列表(或元组)的方法实际上等于其__repr__方法(它返回一个字符串,评价时,将精确地创建相同的对象)。 Because __repr__ method should be encoding-independent, strings are represented in the safest mode possible, ie each character outside of ASCII range is represented as a hex character ( \\xe4 , for example). 因为__repr__方法应该是独立于编码的,所以字符串尽可能以最安全的方式表示,即ASCII范围之外的每个字符都表示为十六进制字符(例如, \\xe4 )。

Unfortunately, as far as I know, there's no library method for printing a list that is locale-aware. 不幸的是,据我所知,没有用于打印区域设置感知的列表的库方法。 You could use an almost-general-purpose function like this: 您可以使用几乎通用的功能,如下所示:

def collection_str(collection):
    if isinstance(collection, list):
        brackets = '[%s]'
        single_add = ''
    elif isinstance(collection, tuple):
        brackets = '(%s)'
        single_add =','
    else:
        return str(collection)
    items = ', '.join([collection_str(x) for x in collection])
    if len(collection) == 1:
        items += single_add
    return brackets % items

>>> print collection_str(t)
[('亀',), ('犬',)]

Note that this won't work for all possible collections (sets and dictionaries, for example), but it's easy to extend it to handle those. 请注意,这不适用于所有可能的集合(例如,集合和字典),但很容易扩展它来处理这些集合。

Python source code files are strictly ASCII, so you must use the \\u\u003c/code> escape sequences unless you specify an encoding. Python源代码文件是严格的ASCII,因此除非指定编码,否则必须使用\\u\u003c/code>转义序列。 See PEP 0263 . PEP 0263

#!/usr/bin/python
# coding=utf-8
t = [u'亀', u'犬']
print t

When you pass an array to print , Python converts the object into a string using Python's rules for string conversions . 当您将数组传递给print ,Python会使用Python的字符串转换规则将对象转换为字符串 The output of such conversions are designed for eval() , which is why you see those \\u\u003c/code> sequences. 这种转换的输出是为eval()设计的,这就是你看到那些\\u\u003c/code>序列的原因。 Here's a hack to get around that based on bobince's solution. 基于bobince的解决方案,这是一个解决问题的方法。 The console must accept Unicode or this will throw an exception. 控制台必须接受Unicode,否则会引发异常。

t = [(u'亀',), (u'犬',)]
print repr(t).decode('raw_unicode_escape')

So this appears to do what I want: 所以这似乎做我想要的:

print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])


>>> t = [('亀',), ('犬',)]
>>> print t
[('\xe4\xba\x80',), ('\xe7\x8a\xac',)]
>>> print '[%s]' % ', '.join([', '.join('(%s,)' % ', '.join(ti) for ti in t)])
[(亀,), (犬,)]

Surely there's a better way to do it. 当然有更好的方法来做到这一点。

(but other two answers thus far don't result in the original string being printed out as desired). (但到目前为止的其他两个答案不会导致原始字符串按需要打印出来)。

It seems people are missing what people want here. 人们似乎想念人们想要的东西。 When I print unicode from a tuple, I just want to get rid of the 'u' '[' '(' and quotes. What we want is a function like below. After scouring the Net it seems to be the cleanest way to get atomic displayable data. If the data is not in a tuple or list, I don't think this problem exists. 当我从一个元组打印unicode时,我只想摆脱'u''[''('和引号。我们想要的是一个类似下面的函数。在搜索网后,它似乎是最干净的方式原子可显示数据。如果数据不在元组或列表中,我认为这个问题不存在。

def Plain(self, U_String) :
          P_String = str(U_String)
          m=re.search("^\(\u?\'(.*)\'\,\)$", P_String)
          if (m) :  #Typical unicode
             P_String = m.group(1).decode("utf8")
          return P_String  

Try: 尝试:

import codecs, sys
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM