简体   繁体   English

__str __()是否在幕后调用encode()方法?

[英]Does __str__() call decode() method behind scenes?

It seems to me that built-in functions __repr__ and __str__ have an important difference in their base definition. 在我看来,内置函数__repr____str__在基本定义上有重要区别。

>>> t2 = u'\u0131\u015f\u0131k'
>>> print t2
ışık
>>> t2
Out[0]: u'\u0131\u015f\u0131k'

t2.decode raises an error since t2 is a unicode string. 由于t2是unicode字符串,因此t2.decode引发错误。

>>> enc = 'utf-8'
>>> t2.decode(enc)
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "C:\java\python\Python25\Lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordin
al not in range(128)

__str__ raises an error as if decode() function is being called: __str__引发错误,就像正在调用__str__ decode()函数一样:

>>> t2.__str__()
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordin
al not in range(128)

but __repr__ works without problem: 但是__repr__可以正常工作:

>>> t2.__repr__()
Out[0]: "u'\\u0131\\u015f\\u0131k'"

Why does __str__ produce an error whereas __repr__ work properly? 为什么__str__产生错误而__repr__正常工作?

This small difference seems to cause a bug in one django application that I am working on. 这种小的差异似乎在我正在处理的一个Django应用程序中引起了错误。

Basically, __str__ can only output ascii strings. 基本上, __str__只能输出ascii字符串。 Since t2 contains unicode codepoints above ascii, it cannot be represented with just a string. 由于t2包含高于ascii的unicode代码点,因此不能仅用字符串来表示。 __repr__ , on the other hand, tries to output the python code needed to recreate the object. 另一方面, __repr__尝试输出重新创建对象所需的python代码。 You'll see that the output from repr(t2) (this syntax is preferred to t2.__repr_() ) is exactly what you set t2 equal to up on the first line. 您将看到repr(t2)的输出(此语法优于t2.__repr_() )是您在第一行将t2设置为up的结果。 The result from repr looks roughly like ['\\', 'u', '0', ...], which are all ascii values, but the output from str is trying to be [chr(0x0131), chr(0x015f), chr(0x0131), 'k'], most of which are above the range of characters acceptable in a python string. repr的结果大致类似于['\\','u','0',...],它们都是ascii值,但是str的输出试图是[chr(0x0131),chr(0x015f) ,chr(0x0131),'k'],其中大多数都超出python字符串可接受的字符范围。 Generally, when dealing with django applications, you should use __unicode__ for everything, and never touch __str__ . 通常,在处理django应用程序时,应使用__unicode__进行所有操作,切勿触摸__str__

More info in the django documentation on strings . django文档中有关字符串的更多信息。

In general, calling str.__unicode__() or unicode.__str__() is a very bad idea, because bytes can't be safely converted to Unicode character points and vice versa. 通常,调用str.__unicode__()unicode.__str__()是一个非常糟糕的主意,因为不能将字节安全地转换为Unicode字符点,反之亦然。 The exception is ASCII values, which are generally the same in all single-byte encodings. ASCII值是一个例外,在所有单字节编码中通常都相同。 The problem is that you're using the wrong method for conversion. 问题是您使用了错误的转换方法。

To convert unicode to str , you should use encode() : 要将unicode转换为str ,应使用encode()

>>> t1 = u"\u0131\u015f\u0131k"
>>> t1.encode("utf-8")
'\xc4\xb1\xc5\x9f\xc4\xb1k'

To convert str to unicode , use decode() : 要将str转换为unicode ,请使用decode()

>>> t2 = '\xc4\xb1\xc5\x9f\xc4\xb1k'
>>> t2.decode("utf-8")
u'\u0131\u015f\u0131k'

To add a bit of support to John's good answer: 为John的好答案提供一些支持:

To understand the naming of the two methods encode() and decode() , you just have to see that Python considers unicode strings of the form u'...' to be in the reference format . 为了理解这两种方法的命名encode()decode() ,您只需要看Python认为u'...'形式的unicode字符串就是参考格式 You encode going from the reference format into another format (eg utf-8), and you decode from some other format to come to the reference format. 您将参考格式编码为另一种格式(例如utf-8),然后从其他格式解码以得到参考格式。 The unicode format is always considered the "real thing" :-). Unicode格式始终被认为是“真实的东西” :-)。

请注意,在Python 3中,unicode是默认设置,而__str__()应该始终为您提供unicode。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM