python：unicode函数vs u前缀

Question

我在Django项目中遇到了UnicodeEncodeError麻烦，最终通过将错误的__unicode__方法的返回值从

return unicode("<span><b>{0}</b>{1}<span>".format(val_str, self.text))

至

return u"<span><b>{0}</b>{1}<span>".format(val_str, self.text)

但是我很困惑为什么这行得通（或者，为什么一开始就有问题）。 u前缀和unicode函数不会做同一件事吗？ 在控制台中尝试时，它们似乎给出了相同的结果：

# with the function
test = unicode("<span><b>{0}</b>{1}<span>".format(2,4))
>>> test
u'<span><b>2</b>4<span>'
>>> type(test)
<type 'unicode'>

# with the prefix
test = u"<span><b>{0}</b>{1}<span>".format(2,4)
>>> test
u'<span><b>2</b>4<span>'
>>> type(test)
<type 'unicode'>

但似乎编码方式因所使用的内容而有所不同。 这里发生了什么？

Answer 1

你的问题在于你应用了哪些unicode() 来 ; 您的两个表达式不相等。

unicode("<span><b>{0}</b>{1}<span>".format(val_str, self.text))

将unicode()应用于以下结果：

"<span><b>{0}</b>{1}<span>".format(val_str, self.text)

而

u"<span><b>{0}</b>{1}<span>".format(val_str, self.text)

等价于：

unicode("<span><b>{0}</b>{1}<span>").format(val_str, self.text)

注意右括号的位置！

所以，你的第一个版本的第一格式 ，然后才格式化为Unicode的结果转换。 这是重要的区别！

当将str.format()与unicode值一起使用时，这些值将传递给str() ，该str.format()那些字符串隐式编码为ASCII。 这会导致您的异常：

>>> 'str format: {}'.format(u'unicode ascii-range value works')
'str format: unicode ascii-range value works'
>>> 'str format: {}'.format(u"unicode latin-range value doesn't work: å")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 40: ordinal not in range(128)

在结果上调用unicode()没关系； 例外已经提出。

另一方面，使用unicode.format()进行格式化没有这样的问题：

>>> u'str format: {}'.format(u'unicode lating-range value works: å')
u'str format: unicode lating-range value works: \xe5'

python：unicode函数vs u前缀

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-09-22 21:00:04

python：unicode函数vs u前缀

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-09-22 21:00:04

解决方案1
5 已采纳 2014-09-22 21:00:04