Python BeautifulSoup：從 Google Play 商店檢索評論相關信息

Question

我正在編寫一個程序來檢索與用戶在 google play store 上發布的評論相關的信息。 它包括評論者姓名、評論評級、評論日期、評論的喜歡或不喜歡以及評論的文本。 為此，我正在使用 BeautifulSoup。 我在檢索上述信息時遇到了麻煩。 讓我通過下面的例子來解釋：我想檢索以下鏈接的評論相關信息：

https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true

這是我的程序的代碼：

import urllib.request
import bs4 as bs
html = urllib.request.urlopen('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true').read()
soup = bs.BeautifulSoup(html, 'html.parser')

我想檢索上述信息。 當我檢查元素時，我發現名為“fk8dgd”的div包含所有評論相關信息（如圖所示）。

為了檢索審閱者的文本，我使用了以下命令：

soup.find('div',{'jscontroller':'H6eOGe'}).get_text()

但是，該命令會引發錯誤：

AttributeError: 'NoneType' object has no attribute 'get_text'

我不確定我在哪里犯了錯誤。 有人能幫我解決這個問題嗎？

Answer 1

不好的原因是html是通過瀏覽器加載頁面后繪制的。

這完全通過selenium加載頁面並在beautifulsoup找到它的內容。

這是代碼

import bs4 as bs
from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true')

# html = urllib.request.urlopen('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true').read()
soup = bs.BeautifulSoup(driver.page_source, 'html.parser')

print(soup.find('div',{'jscontroller':'H6eOGe'}).get_text())

Python BeautifulSoup：從 Google Play 商店檢索評論相關信息

問題描述

1 個解決方案

解決方案1
3 已采納

Python BeautifulSoup：從 Google Play 商店檢索評論相關信息

問題描述

1 個解決方案

解決方案1 3 已采納

解決方案1
3 已采納