[英]UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3
I got this string 'Velcro Back Rest \\xa36.99'
. 我得到了这个字符串
'Velcro Back Rest \\xa36.99'
。 Note it does not have u
in the front. 注意它在前面没有
u
。 Its just plain ascii. 它只是简单的ascii。
How do I convert it to unicode? 如何将其转换为unicode?
I tried this, 我试过这个,
>>> unicode('Velcro Back Rest \xa36.99')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 17: ordinal not in range(128)
This answer explain it nicely. 这个答案很好地解释了。 But I have same question as the OP of that question.
但我和那个问题的OP有同样的问题。 In the answer to that comment Winston says "You should not encoding a string object ..."
在评论的答案中,温斯顿说:“你不应该编码一个字符串对象...”
But the framework I am working requires that it should be converted unicode string. 但我正在工作的框架要求它应该转换为unicode字符串。 I use scrapy and I have this line.
我使用scrapy而且我有这条线。
loader.add_value('name', product_name)
Here product_name
contains that problematic string and it throws the error. 这里
product_name
包含有问题的字符串,它会抛出错误。
You need to specify an encoding to decode the bytes to Unicode with: 您需要指定一个编码来将字节解码为Unicode:
>>> 'Velcro Back Rest \xa36.99'.decode('latin1')
u'Velcro Back Rest \xa36.99'
>>> print 'Velcro Back Rest \xa36.99'.decode('latin1')
Velcro Back Rest £6.99
In this case, I was able to guess the encoding from experience, you need to provide the correct codec used for each encoding you encounter. 在这种情况下,我能够从经验中猜测编码,您需要为遇到的每个编码提供正确的编解码器。 For web data, that is usually included in the from of the content-type header:
对于Web数据,通常包含在content-type标头的from中:
Content-Type: text/html; charset=iso-8859-1
where iso-8859-1
is the official standard name for the Latin 1 encoding, for example. 例如,
iso-8859-1
是Latin 1编码的官方标准名称。 Python recognizes latin1
as an alias for iso-8859-1
. Python将
latin1
识别为iso-8859-1
的别名。
Note that your input data is not plain ASCII. 请注意,您的输入数据不是纯ASCII。 If it was, it'd only use bytes in the range 0 through to 127;
如果是,它只使用0到127范围内的字节;
\\xa3
is 163 decimal, so outside of the ASCII range. \\xa3
是十六进制的163,因此在ASCII范围之外。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.