UnicodeDecodeError：'ascii'编解码器无法解码字节0xa3

Question

I got this string 'Velcro Back Rest \\xa36.99' . 我得到了这个字符串'Velcro Back Rest \\xa36.99' 。 Note it does not have u in the front. 注意它在前面没有u 。 Its just plain ascii. 它只是简单的ascii。

How do I convert it to unicode? 如何将其转换为unicode？

I tried this, 我试过这个，

>>> unicode('Velcro Back Rest \xa36.99')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 17: ordinal not in range(128)

This answer explain it nicely. 这个答案很好地解释了。 But I have same question as the OP of that question. 但我和那个问题的OP有同样的问题。 In the answer to that comment Winston says "You should not encoding a string object ..." 在评论的答案中，温斯顿说：“你不应该编码一个字符串对象...”

But the framework I am working requires that it should be converted unicode string. 但我正在工作的框架要求它应该转换为unicode字符串。 I use scrapy and I have this line. 我使用scrapy而且我有这条线。

loader.add_value('name', product_name)

Here product_name contains that problematic string and it throws the error. 这里product_name包含有问题的字符串，它会抛出错误。

Answer 1

You need to specify an encoding to decode the bytes to Unicode with: 您需要指定一个编码来将字节解码为Unicode：

>>> 'Velcro Back Rest \xa36.99'.decode('latin1')
u'Velcro Back Rest \xa36.99'
>>> print 'Velcro Back Rest \xa36.99'.decode('latin1')
Velcro Back Rest £6.99

In this case, I was able to guess the encoding from experience, you need to provide the correct codec used for each encoding you encounter. 在这种情况下，我能够从经验中猜测编码，您需要为遇到的每个编码提供正确的编解码器。 For web data, that is usually included in the from of the content-type header: 对于Web数据，通常包含在content-type标头的from中：

Content-Type: text/html; charset=iso-8859-1

where iso-8859-1 is the official standard name for the Latin 1 encoding, for example. 例如， iso-8859-1是Latin 1编码的官方标准名称。 Python recognizes latin1 as an alias for iso-8859-1 . Python将latin1识别为iso-8859-1的别名。

Note that your input data is not plain ASCII. 请注意，您的输入数据不是纯ASCII。 If it was, it'd only use bytes in the range 0 through to 127; 如果是，它只使用0到127范围内的字节; \\xa3 is 163 decimal, so outside of the ASCII range. \\xa3是十六进制的163，因此在ASCII范围之外。

UnicodeDecodeError：'ascii'编解码器无法解码字节0xa3

问题描述

1 个解决方案

解决方案1
14 已采纳 2013-06-20 17:06:15

UnicodeDecodeError：&#39;ascii&#39;编解码器无法解码字节0xa3

问题描述

1 个解决方案

解决方案1 14 已采纳 2013-06-20 17:06:15

UnicodeDecodeError：'ascii'编解码器无法解码字节0xa3

解决方案1
14 已采纳 2013-06-20 17:06:15