美丽的汤返回空的html

Question

So this is my second question regarding Beautiful Soup (sorry, im a beginner) 所以这是我关于美丽汤的第二个问题（对不起，我是初学者）

I was trying to fetch data from this website: 我试图从此网站获取数据：

https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/ https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/

My Code: 我的代码：

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

url = 'https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/'

uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "lxml")

print(page_soup)

But for some reason it returns an empty string. 但是由于某种原因，它返回一个空字符串。

I've been searching for similar threads and apparently it has something to do with the website using external api's , but this website doesn't. 我一直在寻找类似的线程，显然它与使用外部api的网站有关，但该网站没有。

Answer 1

It seems that the content-type of the response if gzip so you need to handle that before you can process the html response. 似乎响应的内容类型为gzip，因此您需要先处理该内容，然后才能处理html响应。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import gzip

url = 'https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/'

uClient = uReq(url)
page_html = gzip.decompress(uClient.read())
uClient.close()
page_soup = soup(page_html, "lxml")
print(page_soup)

Answer 2

try using requests module 尝试使用requests模块

Ex: 例如：

import requests
from bs4 import BeautifulSoup as soup

url = 'https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/'

uClient = requests.get(url)
page_soup = soup(uClient.text, "lxml")
print(page_soup)

美丽的汤返回空的html

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-03-30 15:29:36

解决方案2
1 2018-03-30 15:19:37

美丽的汤返回空的html

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-03-30 15:29:36

解决方案2 1 2018-03-30 15:19:37

解决方案1
2 已采纳 2018-03-30 15:29:36

解决方案2
1 2018-03-30 15:19:37