[英]Beautiful Soup returning empty html
所以这是我关于美丽汤的第二个问题(对不起,我是初学者)
我试图从此网站获取数据:
https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/
我的代码:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
url = 'https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/'
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "lxml")
print(page_soup)
但是由于某种原因,它返回一个空字符串。
我一直在寻找类似的线程,显然它与使用外部api的网站有关,但该网站没有。
似乎响应的内容类型为gzip,因此您需要先处理该内容,然后才能处理html响应。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import gzip
url = 'https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/'
uClient = uReq(url)
page_html = gzip.decompress(uClient.read())
uClient.close()
page_soup = soup(page_html, "lxml")
print(page_soup)
尝试使用requests
模块
例如:
import requests
from bs4 import BeautifulSoup as soup
url = 'https://www.ccna8.com/ccna4-v6-0-final-exam-full-100-2017/'
uClient = requests.get(url)
page_soup = soup(uClient.text, "lxml")
print(page_soup)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.