Python错误：“ utf8”编解码器无法解码位置85的字节0x92：无效的起始字节

Question

I am using python2.7 and lxml. 我正在使用python2.7和lxml。 My code is as below 我的代码如下

import urllib
from lxml import html

def get_value(el):
    return get_text(el, 'value') or el.text_content()

response = urllib.urlopen('http://www.edmunds.com/dealerships/Texas/Frisco/DavidMcDavidHondaofFrisco/fullsales-504210667.html').read()
dom = html.fromstring(response)

try:
    description = get_value(dom.xpath("//div[@class='description item vcard']")[0].xpath(".//p[@class='sales-review-paragraph loose-spacing']")[0])
except IndexError, e:
    description = ''

The code crashes inside the try, giving an error 尝试中的代码崩溃，给出错误

UnicodeDecodeError at /
'utf8' codec can't decode byte 0x92 in position 85: invalid start byte

The string that could not be encoded/decoded was: ouldn t be 不能被编码/解码的字符串是：

I have tried using a lot of techniques including .encode('utf8'), but none does solve the problem. 我尝试使用许多技术，包括.encode（'utf8'），但没有一个能解决问题。 I have 2 question: 我有2个问题：

How to solve this problem 如何解决这个问题呢
How can my app crash when the problem code is between a try except 当问题代码介于两次尝试之间时，我的应用程序如何崩溃

Answer 1

The page is being served up with charset=ISO-8859-1 . 该页面由charset=ISO-8859-1 。 Decode from that to unicode. 从此解码为unicode。

[ [ 浏览器的详细信息快照。信用@Old Panda]

Answer 2

Your except clause only handles exceptions of the IndexError type. 您的except子句仅处理IndexError类型的异常。 The problem was a UnicodeDecodeError, which is not an IndexError - so the exception is not handled by that except clause. 问题是UnicodeDecodeError，它不是IndexError-因此，该异常子句不处理该异常。

It's also not clear what 'get_value' does, and that may well be where the actual problem is arising. 还不清楚'get_value'是做什么的，这很可能是实际问题发生的地方。

Answer 3

- skip chars on Error, or decode it correctly to unicode. 在Error上跳过字符，或将其正确解码为unicode。
- you only catch IndexError, not UnicodeDecodeError 您只捕获IndexError，而不捕获UnicodeDecodeError

Answer 4

decode the response to unicode, properly handling errors (ignore on error) before parsing with fromhtml. 解码对unicode的响应，在使用fromhtml解析之前正确处理错误（忽略错误）。
catch the UnicodeDecodeError, or all errors. 捕获UnicodeDecodeError或所有错误。

Python错误：“ utf8”编解码器无法解码位置85的字节0x92：无效的起始字节

问题描述

4 个解决方案

解决方案1
8 已采纳 2012-04-18 14:16:57

解决方案2
1 2012-04-18 14:14:17

解决方案3
0 2012-04-18 14:13:13

解决方案4
0 2012-04-18 14:14:21

Python错误：“ utf8”编解码器无法解码位置85的字节0x92：无效的起始字节

问题描述

4 个解决方案

解决方案1 8 已采纳 2012-04-18 14:16:57

解决方案2 1 2012-04-18 14:14:17

解决方案3 0 2012-04-18 14:13:13

解决方案4 0 2012-04-18 14:14:21

解决方案1
8 已采纳 2012-04-18 14:16:57

解决方案2
1 2012-04-18 14:14:17

解决方案3
0 2012-04-18 14:13:13

解决方案4
0 2012-04-18 14:14:21