简体   繁体   English

我试图从使用urllib的网站获取html数据但是对于某些网站我最终在python中使用了一些未知字符

[英]I am trying to get html data from a site using urllib but for some sites i am ending up with some unknown characters in python

Hey guys i am trying to get a html data from a site using urllib.openurl.read() but for some sites all i am getting is data link this * 6\\xbdW\\xb6\\xd6\\xff\\xca\\x9d\\x9bO|\\xc0\\x96a\\xc7\\xc8\\xf7\\xa7\\x10-\\x8aM{\\xf8\\x* and i have no clue what it is and why i am getting like this. 嘿家伙我试图从网站使用urllib.openurl.read()获取html数据,但对于一些网站,我得到的是数据链接* 6 \\ xbdW \\ xb6 \\ xd6 \\ xff \\ xca \\ x9d \\ x9bO | \\ xc0 \\ x96a \\ xc7 \\ xc8 \\ xf7 \\ xa7 \\ x10- \\ x8aM {\\ xf8 \\ x *并且我不知道它是什么以及为什么我会这样。 I tried googling it some said there is encoding decoding problem i tried that as well but as you can see no luck there so please guide me in this darkness. 我试着谷歌搜索一些说有编码解码问题我试过,但你可以看到没有运气那么请指导我在这黑暗中。 Here is my code --- > 这是我的代码--->

url = "http://mangafox.me/manga/online_the_comic/c001/1.html" # for this site and some more its not working
page = urllib.urlopen(url).read()
print page

and you guys know whats happening after printing this code. 你知道打印这段代码后发生了什么。

This page its on gzip format, you got to unzip before take the data: 这个页面是gzip格式的,你需要在获取数据之前解压缩:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)

0x8b in the begin of the code it means gzip format. 代码开头的0x8b表示gzip格式。

You should take a look in this question: 你应该看看这个问题:

twitter trends api UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte twitter trend api UnicodeDecodeError:'utf8'编解码器无法解码位置1的字节0x8b:意外的代码字节

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当我尝试使用 BeautifulSoup 进行网络抓取时,一些 HTML 数据丢失 - Some HTML data is missing when I am trying to do webscraping using BeautifulSoup 如何使用Python从html表中通过Web抓取数据并将其存储在csv文件中。 我可以提取某些部分,但不能提取其他部分 - How to web scrape data using Python from an html table and store it in a csv file. I am able to extract some parts but not the others 我正在尝试从需要登录但未获取任何数据的站点中抓取 HTML - I am trying to scrape HTML from a site that requires a login but am not getting any data 我正在尝试使用 pdfminer 在 python 中将数据提取为 HTML 元素 - I am trying to extract data as HTML elements in python using pdfminer 我试图在断言错误 python 中跳过某些情况 - I am trying to pass skip some cases in assertion error python 我正在尝试在python中打印一些预定义的序列 - I am trying to print some predefined sequence in python 我正在尝试使用python逐帧读取视频,以对帧执行一些处理 - I am trying to read a video frame by frame using python , to execute some processes on frames 所以我正在尝试使用 atom 在我的 mac 上运行 python 脚本,但由于某种原因它无法正常工作 - So I am trying to run a python script on my mac using atom and for some reason it is not working right 我正在尝试使用 python 从该网站下载年度数据,但我不知道如何处理它? - I am trying to download the Yearly data from this website using python but i am not sure how to approach it? 我试图从espn中提取一些数据作为一个表并将其作为列表 - I am trying to extract some data from espn as a table and getting it as list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM