简体   繁体   English

Beautiful Soup 返回空列表

[英]Beautiful Soup returns empty list

I am new to webscraping.我是网络抓取的新手。 So I have been given a task to extract data from : Here所以我被赋予了从以下位置提取数据的任务:这里

I am choosing dataset of "comments".我正在选择“评论”数据集。 Below is my code for scraping.下面是我的抓取代码。

import requests
from bs4 import BeautifulSoup
url = 'https://www.kaggle.com/hacker-news/hacker-news'
headers = {'User-Agent' : 'Mozilla/5.0'}
response = requests.get(url, headers = headers)
response.status_code
response.content
soup = BeautifulSoup(response.content, 'html.parser')
soup.find_all('tbody', class_ = 'TableBody-kSbjpE jGqIxa')

When I try to execute the last command it returns : [] .当我尝试执行最后一个命令时,它返回: []

So, I am stuck here.所以,我被困在这里。 I know we can get the data from kernel, but just for practice purpose where am I going wrong?我知道我们可以从内核获取数据,但只是为了练习目的我哪里出错了? Am I choosing wrong class?我是否选择了错误的课程? I want to scrape the data and probably save it to a CSV file or to a No-SQL Database, preferred Cassandra.我想抓取数据并可能将其保存到 CSV 文件或 No-SQL 数据库,首选 Cassandra。

you are getting this [] because data you want to scrape is coming from API which loads after you web page load so page you are accessing does not contain that class你得到这个 [] 因为你想要抓取的数据来自在你的网页加载后加载的 API,所以你正在访问的页面不包含该类

you can open you browser console and check in network as given in screenshot there you find data you want to scrape so you have to make request to that URL to get data您可以打开浏览器控制台并按照屏幕截图中的说明检查网络,在那里您可以找到要抓取的数据,因此您必须向该 URL 发出请求以获取数据

在此处输入图片说明

you can retrive data in this URL in preview tab you can see all data.您可以在预览选项卡中检索此 URL 中的数据,您可以看到所有数据。

also if you have good knowledge of python you can also use this to scrape data此外,如果您对 python 有很好的了解,您也可以使用它来抓取数据

https://doc.scrapy.org/en/latest/intro/overview.html https://doc.scrapy.org/en/latest/intro/overview.html

Even though you were able to see the 'tbody', class_ = 'TableBody-kSbjpE jGqIxa' in the element inspector, the request that you make does not contain this class.即使您能够在元素检查器中看到'tbody', class_ = 'TableBody-kSbjpE jGqIxa' ,您发出的请求也不包含此类。 See for yourself print(soup.prettify()) . print(soup.prettify())看看print(soup.prettify()) This is most likely because you're not requesting the correct url.这很可能是因为您没有请求正确的网址。

This may be not something you're aware of, but as a fyi: You don't actually need to scrape using BeautifulSoup, you can get a list of all the available datasets from the API.这可能不是您所知道的,但仅供参考:您实际上并不需要使用 BeautifulSoup 进行抓取,您可以从 API 获取所有可用数据集的列表。 Once you have it installed and configured, you can get the dataset: kaggle datasets download -d .安装和配置完成后,您可以获取数据集: kaggle datasets download -d Here's more info if you wish to proceed with the API instead: https://github.com/Kaggle/kaggle-api如果您想继续使用 API,这里有更多信息: https : //github.com/Kaggle/kaggle-api

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM