Python-不能串联多个非ascii字符串

Question

我正在尝试创建一个新字符串，其中包含1个以上带有特殊字符的字符串。 这不起作用：

# -*- coding: utf-8 -*-
str1 = "I am"
str2 = "español"
str3 = "%s %s %s" % (str1, u'–', str2)
print str3
>> Traceback (most recent call last):
  File "myscript.py", line 5, in <module>
    str3 = "%s %s %s" % (str1, u'–', str2)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

奇怪的是，如果删除ñ或–字符，它将正确创建字符串：

# -*- coding: utf-8 -*-
str1 = "I am"
str2 = "espaol"
str3 = "%s %s %s" % (str1, u'–', str2)
print str3
>> I am – espaol

要么：

# -*- coding: utf-8 -*-
str1 = "I am"
str2 = "español"
str3 = "%s %s" % (str1, str2)
print str3
>> I am español

怎么了

Answer 1

您正在混合Unicode字符串和字节字符串。 不要那样做 。 确保所有字符串都属于同一类型。 最好是unicode 。

当混合str和unicode ，Python将隐式使用ASCII编解码器对一种或另一种类型进行解码或编码。 通过显式编码或解码以使所有内容都成为一种类型，避免隐式操作。

这就是导致您的UnicodeDecodeError异常的原因。 您正在混合两个str对象（字节字符串， str1和str3 ），但是只能将str1解码为ASCII。 str3包含UTF-8数据，因此解码失败。 显式创建unicode字符串或对数据进行解码可以使工作正常：

str1 = u"I am"     # Unicode strings
str2 = u"español"  # Unicode strings
str3 = u"%s %s %s" % (str1, u'–', str2)
print str3

要么

str1 = "I am"
str2 = "español"
str3 = u"%s %s %s" % (str1.decode('utf-8'), u'–', str2.decode('utf-8'))
print str3

注意，我也使用Unicode字符串文字作为格式字符串！

您确实应该阅读Unicode，编解码器和Python。 我强烈推荐以下文章：

内德·巴切德尔的实用Unicode
乔尔·斯波斯基（Joel Spolsky）的“每个程序员必须了解的Unicode的最低要求”
Python Unicode HOWTO

Python-不能串联多个非ascii字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-02-27 08:34:00

Python-不能串联多个非ascii字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-02-27 08:34:00

解决方案1
1 已采纳 2017-02-27 08:34:00