简体   繁体   English

如何使用表示为unicode的python对象列表

[英]how to use list of python objects whose representation is unicode

I have a object which contains unicode data and I want to use that in its representaion eg 我有一个包含unicode数据的对象,我想在其代表中使用它,例如

# -*- coding: utf-8 -*-

class A(object):

    def __unicode__(self):
        return u"©au"

    def __repr__(self):
        return unicode(self).encode("utf-8")

    __str__ = __repr__ 

a = A()


s1 = u"%s"%a # works
#s2 = u"%s"%[a] # gives unicode decode error
#s3 = u"%s"%unicode([a])  # gives unicode decode error

Now even if I return unicode from repr it still gives error so question is how can I use a list of such objects and create another unicode string out of it? 现在,即使我从repr返回unicode,它仍然会出错,所以问题是如何使用这些对象的列表并从中创建另一个unicode字符串?

platform details: 平台细节:

"""
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
'Linux-2.6.24-19-generic-i686-with-debian-lenny-sid'
""" 

also not sure why 也不确定为什么

print a # works
print unicode(a) # works
print [a] # works
print unicode([a]) # doesn't works 

python group answers that http://groups.google.com/group/comp.lang.python/browse_thread/thread/bd7ced9e4017d8de/2e0b07c761604137?lnk=gst&q=unicode#2e0b07c761604137 python group回答http://groups.google.com/group/comp.lang.python/browse_thread/thread/bd7ced9e4017d8de/2e0b07c761604137?lnk=gst&q=unicode#2e0b07c761604137

s1 = u"%s"%a # works

This works, because when dealing with 'a' it is using its unicode representation (ie the unicode method), 这是有效的,因为当处理'a'时它正在使用它的unicode表示(即unicode方法),

when however you wrap it in a list such as '[a]' ... when you try to put that list in the string, what is being called is the unicode([a]) (which is the same as repr in the case of list), the string representation of the list, which will use 'repr(a)' to represent your item in its output. 然而,当你将它包装在诸如'[a]'之类的列表中时...当你试图将该列表放入字符串时,所谓的是unicode([a])(这与repr中的相同)列表的情况),列表的字符串表示,它将使用'repr(a)'在其输出中表示您的项目。 This will cause a problem since you are passing a 'str' object (a string of bytes) that contain the utf-8 encoded version of 'a', and when the string format is trying to embed that in your unicode string, it will try to convert it back to a unicode object using hte default encoding, ie ASCII. 这会导致问题,因为您传递的'str'对象(字符串)包含utf-8编码版本的'a',并且当字符串格式试图将其嵌入到您的unicode字符串中时,它将会尝试使用hte默认编码(即ASCII)将其转换回unicode对象。 since ascii doesn't have whatever character it's trying to conver, it fails 因为ascii没有任何它想要转换的角色,所以它失败了

what you want to do would have to be done this way: u"%s" % repr([a]).decode('utf-8') assuming all your elements encode to utf-8 (or ascii, which is a utf-8 subset from unicode point of view). 你想要做的就是这样做: u"%s" % repr([a]).decode('utf-8')假设所有元素都编码为utf-8(或ascii,这是一个从unicode角度看utf-8子集)。

for a better solution (if you still want keep the string looking like a list str) you would have to use what was suggested previously, and use join, in something like this: 为了获得更好的解决方案(如果你仍然希望保持字符串看起来像列表str)你将不得不使用之前建议的,并使用join,如下所示:

u '[%s]' % u','.join(unicode(x) for x in [a,a]) '[%s]' % u','.join(unicode(x) for x in [a,a])

though this won't take care of list containing list of your A objects. 虽然这不会处理包含A对象列表的列表。

My explanation sounds terribly unclear, but I hope you can make some sense out of it. 我的解释听起来非常不清楚,但我希望你能从中得到一些理解。

Try: 尝试:

s2 = u"%s"%[unicode(a)] 

Your main problem is that you are doing more conversions than you expect. 您的主要问题是您的转化次数超出预期。 Lets consider the following: 让我们考虑以下事项:

s2 = u"%s"%[a] # gives unicode decode error

From Python Documentation , Python文档

's'     String (converts any python object using str()).
    If the object or format provided is a unicode string, 
    the resulting string will also be unicode.

When the %s format string is being processed, str([a]) is applied. 正在处理%s格式字符串时,将应用str([a])。 What you have at this point is a string object containg a sequence of unicode bytes. 此时你所拥有的是一个包含一系列unicode字节的字符串对象。 If you try and print this there is no problem, because the bytes pass straight through to your terminal and are rendered by the terminal. 如果您尝试打印它没有问题,因为字节直接传递到您的终端并由终端呈现。

>>> x = "%s" % [a]
>>> print x
[©au]

The problem arises when you try to convert that back to unicode. 当您尝试将其转换回unicode时会出现问题。 Essentially, the function unicode is being called on the string which contains the sequence of unicode-encoded bytes, and that is what causes the ascii codec to fail. 本质上,函数unicode是在包含unicode编码字节序列的字符串上调用的,这就是导致ascii编解码器失败的原因。

>>> u"%s" % x
    Traceback (most recent call last):
      File "", line 1, in 
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)
    >>> unicode(x)
    Traceback (most recent call last):
      File "", line 1, in 
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)

First of all, ask yourself what you're trying to achieve. 首先,问问自己你想要实现的目标。 If all you want is a round-trippable representation of the list, you should simply do the following: 如果你想要的只是列表的圆形表示,你应该简单地执行以下操作:

class A(object):
    def __unicode__(self):
        return u"©au"
    def __repr__(self):
        return repr(unicode(self))
    __str__ = __repr__

>>> A()
u'\xa9au'
>>> [A()]
[u'\xa9au']
>>> u"%s" % [A()]
u"[u'\\xa9au']"
>>> "%s" % [A()]
"[u'\\xa9au']"
>>> print u"%s" % [A()]
[u'\xa9au']

That's how it's supposed to work. 这就是它应该如何运作。 String representation of python lists are not something a user should see, so it makes sense to have escaped characters in them. python列表的字符串表示不是用户应该看到的,因此在其中包含转义字符是有意义的。

如果要使用unicode()能够对象的列表来创建unicode字符串,请尝试以下操作:

u''.join([unicode(v) for v in [a,a]])

Since this question involves a lot of confusing unicode stuff, I thought I'd offer an analysis of what was going on here. 由于这个问题涉及很多令人困惑的unicode东西,我想我会对这里发生的事情进行分析。

It all comes down to the implementation of __unicode__ and __repr__ of the builtin list class. 这一切都归结为内置list类的__unicode____repr__的实现。 Basically, it is equivalent to: 基本上,它相当于:

class list(object):
    def __repr__(self):
        return "[%s]" % ", ".join(repr(e) for e in self.elements)
    def __str__(self):
        return repr(self)
    def __unicode__(self):
        return str(self).decode()

Actually, list doesn't even define the __unicode__ and __str__ methods , which makes sense when you think about it. 实际上, list甚至没有定义__unicode____str__方法 ,这在您考虑它时是有意义的。

When you write: 当你写:

u"%s" % [a]                          # it expands to
u"%s" % unicode([a])                 # which expands to
u"%s" % repr([a]).decode()           # which expands to
u"%s" % ("[%s]" % repr(a)).decode()  # (simplified a little bit)
u"%s" % ("[%s]" % unicode(a).encode('utf-8')).decode()  

That last line is an expansion of repr(a), using the implementation of __repr__ in the question. 最后一行是repr(a)的扩展,使用问题中的__repr__实现。

So as you can see, the object is first encoded in utf-8, only to be decoded later with the system default encoding, which usually doesn't support all characters. 正如您所看到的,该对象首先以utf-8编码,但稍后将使用系统默认编码进行解码,该编码通常不支持所有字符。

As some of the other answers mentioned, you can write your own function, or even subclass list, like so: 正如提到的其他一些答案,您可以编写自己的函数,甚至是子类列表,如下所示:

class mylist(list):
    def __unicode__(self):
        return u"[%s]" % u", ".join(map(unicode, self))

Note that this format is not round-trippable. 请注意,此格式不是圆形的。 It can even be misleading: 它甚至可能会产生误导:

>>> unicode(mylist([]))
u'[]'
>>> unicode(mylist(['']))
u'[]'

Of cource, you can write a quote_unicode function to make it round-trippable, but this is the moment to ask youself what's the point . 对于cource,你可以编写一个quote_unicode函数来使它成为可循环使用的函数,但现在就是问自己这一点的时刻 The unicode and str functions are meant to create a representation of an object that makes sense to a user. unicodestr函数用于创建对用户有意义的对象的表示。 For programmers, there's the repr function. 对于程序员来说,有repr功能。 Raw lists are not something a user is ever supposed to see. 原始列表不是用户应该看到的东西。 That's why the list class does not implement the __unicode__ method. 这就是list类没有实现__unicode__方法的原因。

To get a somewhat better idea about what happens when, play with this little class: 为了更好地了解发生的事情,请使用这个小课程:

class B(object):
    def __unicode__(self):
        return u"unicode"
    def __repr__(self):
        return "repr"
    def __str__(self):
        return "str"


>>> b
repr
>>> [b]
[repr]
>>> unicode(b)
u'unicode'
>>> unicode([b])
u'[repr]'

>>> print b
str
>>> print [b]
[repr]
>>> print unicode(b)
unicode
>>> print unicode([b])
[repr]
# -*- coding: utf-8 -*-

class A(object):
    def __unicode__(self):
        return u"©au"

    def __repr__(self):
        return unicode(self).encode('ascii', 'replace')

    __str__ = __repr__

a = A()

>>> u"%s" % a
u'\xa9au'
>>> u"%s" % [a]
u'[?au]'

repr and str are both supposed to return str objects, at least up to Python 2.6.x. reprstr都应该返回str对象,至少是Python 2.6.x. You're getting the decode error because repr() is trying to convert your result into a str, and it's failing. 你得到解码错误,因为repr()试图将你的结果转换为str,并且它失败了。

I believe this has changed in Python 3.x. 我相信这在Python 3.x中有所改变。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM