[英]Use Beautiful Soup to scrape all questions a person has answered on Quora
我如何編寫漂亮的湯來抓取特定用戶已回答的所有問題?
輸入:
URL 作者
例如: https://www.quora.com/profile/AUTHOR/answers )
Output:
第 1 列:作者已回答的問題
示例:“Lorem Ipsum 問題”
第2欄:已回答問題的URL
例如: https://www.quora.com/lorem-ipsum-question
第 3 列:已回答問題的 URL
例如: https://www.quora.com/lorem-ipsum-question
此腳本將打印頁面上找到的所有答案/url。 還有無限滾動向https://www.quora.com/graphql/gql_para_POST?q=UserProfileAnswersMostRecent_RecentAnswers_Query
發出 POST 請求,但我無法從中獲取數據(您可以在開發人員工具 -> 網絡選項卡中看到它):
import re
import json
import requests
url = 'https://www.quora.com/profile/Nana-Bello-Shehu/answers'
html_data = requests.get(url).text
d = re.findall(r'window\.ansFrontendGlobals\.data\.inlineQueryResults\.results\[".*?"\] = ("{.*}");', html_data)[-1]
d = json.loads(json.loads(d));
for e in d['data']['user']['recentPublicAndPinnedAnswersConnection']['edges']:
if e['node']['__typename'] != 'Answer':
continue
q = json.loads(e['node']['question']['title'])
title = q['sections'][0]['spans'][0]['text']
u = 'https://www.quora.com' + e['node']['question']['url']
print('{:<90} {}'.format(title, u))
印刷:
Do pictures speak louder than words? https://www.quora.com/Do-pictures-speak-louder-than-words
Does true love exist? https://www.quora.com/Does-true-love-exist-8
What picture made your blood boil? https://www.quora.com/What-picture-made-your-blood-boil
What are the before and after pics of people who are drug addicts for several years? https://www.quora.com/What-are-the-before-and-after-pics-of-people-who-are-drug-addicts-for-several-years
What was the funniest thing you saw/heard today? https://www.quora.com/What-was-the-funniest-thing-you-saw-heard-today
Are there any truly selfless acts, motives, or people? https://www.quora.com/Are-there-any-truly-selfless-acts-motives-or-people
Which famous person in history who is idolized, was actually a horrible person? https://www.quora.com/Which-famous-person-in-history-who-is-idolized-was-actually-a-horrible-person
What is something that you read recently and is worth sharing? https://www.quora.com/What-is-something-that-you-read-recently-and-is-worth-sharing
How do I get the attention of my crush? https://www.quora.com/How-do-I-get-the-attention-of-my-crush
What are some heart touching stories of best friends? https://www.quora.com/What-are-some-heart-touching-stories-of-best-friends
我認為最簡單的方法是使用 selenium:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
import time
url = 'https://www.quora.com/profile/Nana-Bello-Shehu/answers'
driver.get(url)
SCROLL_TIME = 2
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(SCROLL_TIME)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
qbox = driver.find_elements_by_css_selector('.qu-pb--medium')
for qb in qbox:
print(qb.find_element_by_css_selector('span.qu-userSelect--text').text)
print('https://www.quora.com' + qb.find_element_by_css_selector('a.q-box.qu-cursor--pointer.qu-hover--textDecoration--underline').get_attribute('href'))
print('\n')
Output:
Do pictures speak louder than words?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu
Does true love exist?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu
What picture made your blood boil?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu
What are the before and after pics of people who are drug addicts for several years?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu
What was the funniest thing you saw/heard today?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu
Are there any truly selfless acts, motives, or people?
https://www.quora.comhttps://www.quora.com/profile/Nana-Bello-Shehu
等等...
此腳本滾動到頁面末尾並復制所有問題。 您可以嘗試設置較低的SCROLL_TIME以使腳本更快,但有時腳本會在頁面結束之前以較短的滾動時間結束。
筆記:
c:/program/geckodriver.exe
導入它,因此如果將 geckodriver 添加到其他路徑,則需要更改executable_path
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.