简体   繁体   English

Python错误:“ utf8”编解码器无法解码位置85的字节0x92:无效的起始字节

[英]Python error: 'utf8' codec can't decode byte 0x92 in position 85: invalid start byte

I am using python2.7 and lxml. 我正在使用python2.7和lxml。 My code is as below 我的代码如下

import urllib
from lxml import html

def get_value(el):
    return get_text(el, 'value') or el.text_content()

response = urllib.urlopen('http://www.edmunds.com/dealerships/Texas/Frisco/DavidMcDavidHondaofFrisco/fullsales-504210667.html').read()
dom = html.fromstring(response)

try:
    description = get_value(dom.xpath("//div[@class='description item vcard']")[0].xpath(".//p[@class='sales-review-paragraph loose-spacing']")[0])
except IndexError, e:
    description = ''

The code crashes inside the try, giving an error 尝试中的代码崩溃,给出错误

UnicodeDecodeError at /
'utf8' codec can't decode byte 0x92 in position 85: invalid start byte

The string that could not be encoded/decoded was: ouldn t be 不能被编码/解码的字符串是:

I have tried using a lot of techniques including .encode('utf8'), but none does solve the problem. 我尝试使用许多技术,包括.encode('utf8'),但没有一个能解决问题。 I have 2 question: 我有2个问题:

  1. How to solve this problem 如何解决这个问题呢
  2. How can my app crash when the problem code is between a try except 当问题代码介于两次尝试之间时,我的应用程序如何崩溃

The page is being served up with charset=ISO-8859-1 . 该页面由charset=ISO-8859-1 Decode from that to unicode. 从此解码为unicode。

[ [ 浏览器的详细信息快照。信用@Old Panda]

Your except clause only handles exceptions of the IndexError type. 您的except子句仅处理IndexError类型的异常。 The problem was a UnicodeDecodeError, which is not an IndexError - so the exception is not handled by that except clause. 问题是UnicodeDecodeError,它不是IndexError-因此,该异常子句不处理该异常。

It's also not clear what 'get_value' does, and that may well be where the actual problem is arising. 还不清楚'get_value'是做什么的,这很可能是实际问题发生的地方。

    • skip chars on Error, or decode it correctly to unicode. 在Error上跳过字符,或将其正确解码为unicode。
    • you only catch IndexError, not UnicodeDecodeError 您只捕获IndexError,而不捕获UnicodeDecodeError
  1. decode the response to unicode, properly handling errors (ignore on error) before parsing with fromhtml. 解码对unicode的响应,在使用fromhtml解析之前正确处理错误(忽略错误)。

  2. catch the UnicodeDecodeError, or all errors. 捕获UnicodeDecodeError或所有错误。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 UnicodeDecodeError:“ utf8”编解码器无法解码位置661中的字节0x92:无效的起始字节 - UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 661: invalid start byte Anaconda:UnicodeDecodeError:'utf8'编解码器无法解码位置1412中的字节0x92:无效的起始字节 - Anaconda: UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 1412: invalid start byte UnicodeDecodeError'utf-8'编解码器无法解码位置2893中的字节0x92:无效的起始字节 - UnicodeDecodeError 'utf-8' codec can't decode byte 0x92 in position 2893: invalid start byte “utf-8”编解码器无法解码 position 107 中的字节 0x92:无效的起始字节 - 'utf-8' codec can't decode byte 0x92 in position 107: invalid start byte “utf-8”编解码器无法解码位置 11 中的字节 0x92:起始字节无效 - 'utf-8' codec can't decode byte 0x92 in position 11: invalid start byte “utf-8”编解码器无法解码 position 18 中的字节 0x92:无效的起始字节 - 'utf-8' codec can't decode byte 0x92 in position 18: invalid start byte 使用 CSVLogger 时出错:“utf-8”编解码器无法解码位置 144 中的字节 0x92:起始字节无效 - Error using CSVLogger: 'utf-8' codec can't decode byte 0x92 in position 144: invalid start byte 我不断收到 UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte - I keep getting UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte 如何跟踪UnicodeDecodeError:'utf8'编解码器无法解码位置1950的字节0x85:无效的起始字节 - How do I trace UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 1950: invalid start byte 将查询结果写入CSV时,“ utf8”编解码器无法解码字节0x92 - 'utf8' codec can't decode byte 0x92 when writing query results to csv
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM