str （）是否在幕后调用encode（）方法？

Question

It seems to me that built-in functions __repr__ and __str__ have an important difference in their base definition. 在我看来，内置函数__repr__和__str__在基本定义上有重要区别。

>>> t2 = u'\u0131\u015f\u0131k'
>>> print t2
ışık
>>> t2
Out[0]: u'\u0131\u015f\u0131k'

t2.decode raises an error since t2 is a unicode string. 由于t2是unicode字符串，因此t2.decode引发错误。

>>> enc = 'utf-8'
>>> t2.decode(enc)
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "C:\java\python\Python25\Lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordin
al not in range(128)

__str__ raises an error as if decode() function is being called: __str__引发错误，就像正在调用__str__ decode()函数一样：

>>> t2.__str__()
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordin
al not in range(128)

but __repr__ works without problem: 但是__repr__可以正常工作：

>>> t2.__repr__()
Out[0]: "u'\\u0131\\u015f\\u0131k'"

Why does __str__ produce an error whereas __repr__ work properly? 为什么__str__产生错误而__repr__正常工作？

This small difference seems to cause a bug in one django application that I am working on. 这种小的差异似乎在我正在处理的一个Django应用程序中引起了错误。

Answer 1

Basically, __str__ can only output ascii strings. 基本上， __str__只能输出ascii字符串。 Since t2 contains unicode codepoints above ascii, it cannot be represented with just a string. 由于t2包含高于ascii的unicode代码点，因此不能仅用字符串来表示。 __repr__ , on the other hand, tries to output the python code needed to recreate the object. 另一方面， __repr__尝试输出重新创建对象所需的python代码。 You'll see that the output from repr(t2) (this syntax is preferred to t2.__repr_() ) is exactly what you set t2 equal to up on the first line. 您将看到repr（t2）的输出（此语法优于t2.__repr_() ）是您在第一行将t2设置为up的结果。 The result from repr looks roughly like ['\\', 'u', '0', ...], which are all ascii values, but the output from str is trying to be [chr(0x0131), chr(0x015f), chr(0x0131), 'k'], most of which are above the range of characters acceptable in a python string. repr的结果大致类似于['\\'，'u'，'0'，...]，它们都是ascii值，但是str的输出试图是[chr（0x0131），chr（0x015f），chr（0x0131），'k']，其中大多数都超出python字符串可接受的字符范围。 Generally, when dealing with django applications, you should use __unicode__ for everything, and never touch __str__ . 通常，在处理django应用程序时，应使用__unicode__进行所有操作，切勿触摸__str__ 。

More info in the django documentation on strings . django文档中有关字符串的更多信息。

Answer 2

In general, calling str.__unicode__() or unicode.__str__() is a very bad idea, because bytes can't be safely converted to Unicode character points and vice versa. 通常，调用str.__unicode__()或unicode.__str__()是一个非常糟糕的主意，因为不能将字节安全地转换为Unicode字符点，反之亦然。 The exception is ASCII values, which are generally the same in all single-byte encodings. ASCII值是一个例外，在所有单字节编码中通常都相同。 The problem is that you're using the wrong method for conversion. 问题是您使用了错误的转换方法。

To convert unicode to str , you should use encode() : 要将unicode转换为str ，应使用encode() ：

>>> t1 = u"\u0131\u015f\u0131k"
>>> t1.encode("utf-8")
'\xc4\xb1\xc5\x9f\xc4\xb1k'

To convert str to unicode , use decode() : 要将str转换为unicode ，请使用decode() ：

>>> t2 = '\xc4\xb1\xc5\x9f\xc4\xb1k'
>>> t2.decode("utf-8")
u'\u0131\u015f\u0131k'

Answer 3

To add a bit of support to John's good answer: 为John的好答案提供一些支持：

To understand the naming of the two methods encode() and decode() , you just have to see that Python considers unicode strings of the form u'...' to be in the reference format . 为了理解这两种方法的命名encode（）和decode（） ，您只需要看Python认为u'...'形式的unicode字符串就是参考格式 。 You encode going from the reference format into another format (eg utf-8), and you decode from some other format to come to the reference format. 您将参考格式编码为另一种格式（例如utf-8），然后从其他格式解码以得到参考格式。 The unicode format is always considered the "real thing" :-). Unicode格式始终被认为是“真实的东西” ：-)。

Answer 4

请注意，在Python 3中，unicode是默认设置，而__str__()应该始终为您提供unicode。

str （）是否在幕后调用encode（）方法？

问题描述

4 个解决方案

解决方案1
7 已采纳 2009-08-12 18:20:16

解决方案2
5 2009-08-12 18:34:12

解决方案3
2 2009-08-12 19:04:37

解决方案4
0 2009-08-12 20:39:34

__str __（）是否在幕后调用encode（）方法？

问题描述

4 个解决方案

解决方案1 7 已采纳 2009-08-12 18:20:16

解决方案2 5 2009-08-12 18:34:12

解决方案3 2 2009-08-12 19:04:37

解决方案4 0 2009-08-12 20:39:34

str （）是否在幕后调用encode（）方法？

解决方案1
7 已采纳 2009-08-12 18:20:16

解决方案2
5 2009-08-12 18:34:12

解决方案3
2 2009-08-12 19:04:37

解决方案4
0 2009-08-12 20:39:34