'utf-8'编解码器无法解码位置139604中的字节0xf6：无效的起始字节

Question

我正在做一个知识工程项目。

当我在搜寻某些科学家的个人站点时，发生了此错误。

import html2text
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import urllib


homepage = "http://angom.myweb.cs.uwindsor.ca"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = urllib.request.Request(url=homepage, headers=headers)
print(req)
c = urlopen(req).read()
print(type(c))

content = urlopen(req).read().decode("utf-8")

UnicodeDecodeError：“ utf-8”编解码器无法解码位置139604中的字节0xf6：无效的起始字节

Answer 1

页面标题中的编码说明：

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

..所以在解码字符串时使用它。

content = urlopen(req).read().decode("windows-1252")

将在这种情况下工作。

如果您打算使用BeautifulSoup，那么在确定编码方面已经做得非常好。

'utf-8'编解码器无法解码位置139604中的字节0xf6：无效的起始字节

问题描述

1 个解决方案

解决方案1
0 2017-07-18 04:20:28

&#39;utf-8&#39;编解码器无法解码位置139604中的字节0xf6：无效的起始字节

问题描述

1 个解决方案

解决方案1 0 2017-07-18 04:20:28

'utf-8'编解码器无法解码位置139604中的字节0xf6：无效的起始字节

解决方案1
0 2017-07-18 04:20:28