繁体   English   中英

如何使用 selenium 从网站上抓取评分和所有评论

[英]How to scrape the ratings and all the reviews from the website using selenium

我想刮掉页面上的评分和所有评论。但找不到路径。

enter code here
import urllib.request
from bs4 import BeautifulSoup
import csv
import os
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time
chrome_path =r'C:/Users/91940/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe'
driver = webdriver.Chrome(executable_path=chrome_path)
driver.implicitly_wait(10)
driver.get("https://www.lazada.sg/products/samsung-galaxy-watch3-bt-45mm-titanium-i1156462257- 
        s4537770883.html?search=1&freeshipping=1")
product_name = driver.find_element_by_xpath('//*[@id="module_product_title_1"]/div/div/h1')
print(product_name.text)
rating = driver.find_element_by_xpath("//span[@class='score-average']")
print(rate.text)
review = driver .find_element_by_xpath('//* 
         [@id="module_product_review"]/div/div/div[3]/div[1]/div[1]')
print(review.text)

也许你的路径有问题? (抱歉,我不在 windows 上进行测试)。 从 memory 开始,Windows 路径使用\字符而不是/ 此外,驱动器路径后可能需要两个反引号( C:\\ )。

c:\\Users\91940\AppData\Local\...

我相信print(product_name.text)正在执行正确,对吗?

driver.find_element_by_xpath("//span[@class='score-average']")存在问题,我在 HTML 源代码中的任何地方都找不到score-average

所以试试这个:

driver.find_element_by_css_selector("div.pdp-review-summary")
print(rate.text)

您可以尝试以下代码以获得审核

wait = WebDriverWait(driver, 10)
driver.get("https://www.lazada.sg/products/samsung-galaxy-watch3-bt-45mm-titanium-i1156462257- s4537770883.html?search=1&freeshipping=1")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[class$='pdp-review-summary__link']"))).click()
ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, "//h2[contains(text(), 'Ratings & Reviews')]")))).perform()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.item-content")))
for review in driver.find_elements(By.CSS_SELECTOR, "div.item-content"):
    print(review.get_attribute('innerHTML'))

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM