通過 beautifulsoup 拉取所有 yelp 評論

Question

我需要一些幫助來提取使用美麗湯的酒店的所有評論； 這是我到目前為止所擁有的，但我需要一些靈感來通過 API 或常規方式拉取所有評論。

 import time import random from bs4 import BeautifulSoup as bs import urllib.request as url html = urllib.request.urlopen('https://www.yelp.com/biz/shore-cliff-hotel-pismo-beach-2').read().decode('utf-8') soup = bs(html, 'html.parser') relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke') reviews = [] for div in relevant: for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"): text = html_class.find('span') review = html_class.getText( reviews.append(review)

enter code here

Answer 1

這完成了工作，

base_url = "https://www.yelp.com/biz/capri-laguna-laguna-beach"
new_page = "?start={}"

content = requests.get(url).content
soup = BeautifulSoup(content, "html.parser")

reviews = []

for i in range(0, 501, 10):
  new_page_url = url + new_page.format(i)
  
  new_content = requests.get(url).content
  new_soup = BeautifulSoup(content, "html.parser")

  relevant= new_soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')

  for div in relevant:
    for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
      text = html_class.find('span')
      review = html_class.getText()
      reviews.append(review)

代碼解釋——

如果你點擊 go 到第二頁，你會看到?start=10被添加到基礎 URL https://www.yelp.com/biz/capri-laguna-laguna-beach 。 如果您 go 到第 3 頁，那么您會看到?start=20等等。 這里的數字是評論的索引，每頁有10條。 總共有 51 頁，這意味着第 51 頁上的第一篇評論的索引為 501。因此，添加到 URL 的部分將是?start=500 。

因此，對於網站上的每個頁面，代碼都會創建一個新的 URL，獲取該 URL 的 HTML 內容，為其創建湯並從這個新創建的湯中獲取評論。

通過 beautifulsoup 拉取所有 yelp 評論

問題描述

1 個解決方案

解決方案1
0 2022-04-15 08:12:18

通過 beautifulsoup 拉取所有 yelp 評論

問題描述

1 個解決方案

解決方案1 0 2022-04-15 08:12:18

解決方案1
0 2022-04-15 08:12:18