从 quora 中抓取问题答案、日期和投票

Question

I'm trying to scrape answers, dates and upvote figures from this answer using beautifulsoup - however I cannot select the class="pagedlist_item" .我正在尝试使用beautifulsoup从这个答案中抓取答案、日期和投票数字 - 但是我无法选择class="pagedlist_item" 。 The reason I would like to start from this class, which includes the content of each answer, is that some posts don't have upvotes for example so I would end up with lists of elements of different lenghts in case something is missing as well as mixing the order of same variables.我想从这门课开始的原因，其中包括每个答案的内容，例如有些帖子没有赞成票，所以我最终会得到不同长度的元素列表，以防万一缺少某些内容以及混合相同变量的顺序。

items_soup = BeautifulSoup(html, "html")
items_soup.find_all("div", {"class" : "pagedlist_item"})

when I run this code it returns an empty list - so not sure what's wrong?当我运行此代码时，它返回一个空列表 - 所以不知道出了什么问题？ from this I would then like to extract the text of the answer, the date and the upvote figure (even when there isn't one - so basically replace the empty with a 0).然后我想从中提取答案的文本、日期和投票数字（即使没有 - 所以基本上用 0 替换空白）。

is it possible to split and get each of the elements I listed?是否可以拆分并获取我列出的每个元素？ answers text, date of the answer and upvote figure for the answer - the aim is to then create a dataframe.答案文本，答案日期和答案的投票数字 - 目的是创建一个数据框。

to keep in mind: the post has 49 answers but doesn't show all of them if you don't scroll down, and I would like to scrape all the 49 answers.请记住：该帖子有 49 个答案，但如果您不向下滚动，则不会显示所有答案，我想抓取所有 49 个答案。

Answer 1

I'm able to get what you're looking for with the following code:我可以使用以下代码获得您正在寻找的内容：

import requests
from bs4 import BeautifulSoup

url = 'https://www.quora.com/What-is-the-brutal-truth-about-data-scientists'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

question = soup.find('span', {'class': 'ui_qtext_rendered_qtext'})
answers = [ s.text for s in soup.find_all("div", {"class" : "pagedlist_item"}) if s.text ]

results in question == 'What is the brutal truth about data scientists?'结果有question == 'What is the brutal truth about data scientists?' and a list of 28 answers.以及 28 个答案的列表。

Answer 2

There is not an empty list when I run the following:运行以下命令时没有空列表：

import requests
from bs4 import BeautifulSoup

html ='https://www.quora.com/What-is-the-brutal-truth-about-data-scientists'
r = requests.get(url).text
soup = BeautifulSoup(r, 'html')
soup.find_all("div", {"class" : "pagedlist_item"})

Please check this out!请检查一下！ Not sure if you have included requests.不确定您是否包含请求。

从 quora 中抓取问题答案、日期和投票

问题描述

2 个解决方案

解决方案1
0 2020-01-24 16:36:32

解决方案2
0 2020-01-24 16:36:46

从 quora 中抓取问题答案、日期和投票

问题描述

2 个解决方案

解决方案1 0 2020-01-24 16:36:32

解决方案2 0 2020-01-24 16:36:46

解决方案1
0 2020-01-24 16:36:32

解决方案2
0 2020-01-24 16:36:46