简体   繁体   English

如何使用 selenium 从网站上抓取评分和所有评论

[英]How to scrape the ratings and all the reviews from the website using selenium

I want to scrape the rating and all the reviews on the page.But not able to find the path.我想刮掉页面上的评分和所有评论。但找不到路径。

enter code here
import urllib.request
from bs4 import BeautifulSoup
import csv
import os
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time
chrome_path =r'C:/Users/91940/AppData/Local/Programs/Python/Python39/Scripts/chromedriver.exe'
driver = webdriver.Chrome(executable_path=chrome_path)
driver.implicitly_wait(10)
driver.get("https://www.lazada.sg/products/samsung-galaxy-watch3-bt-45mm-titanium-i1156462257- 
        s4537770883.html?search=1&freeshipping=1")
product_name = driver.find_element_by_xpath('//*[@id="module_product_title_1"]/div/div/h1')
print(product_name.text)
rating = driver.find_element_by_xpath("//span[@class='score-average']")
print(rate.text)
review = driver .find_element_by_xpath('//* 
         [@id="module_product_review"]/div/div/div[3]/div[1]/div[1]')
print(review.text)

Perhaps there is a problem with your path?也许你的路径有问题? (apologies I'm not on windows to test). (抱歉,我不在 windows 上进行测试)。 From memory, Windows paths use \ characters instead of / .从 memory 开始,Windows 路径使用\字符而不是/ Additionally, you may need two backticks after the drive path ( C:\\ ).此外,驱动器路径后可能需要两个反引号( C:\\ )。

c:\\Users\91940\AppData\Local\...

I believe print(product_name.text) is getting execute correct, right?我相信print(product_name.text)正在执行正确,对吗?

There is an issue with driver.find_element_by_xpath("//span[@class='score-average']") I could not found score-average anywhere in HTML source. driver.find_element_by_xpath("//span[@class='score-average']")存在问题,我在 HTML 源代码中的任何地方都找不到score-average

so try this instead:所以试试这个:

driver.find_element_by_css_selector("div.pdp-review-summary")
print(rate.text)

You can try the below code to get review :您可以尝试以下代码以获得审核

wait = WebDriverWait(driver, 10)
driver.get("https://www.lazada.sg/products/samsung-galaxy-watch3-bt-45mm-titanium-i1156462257- s4537770883.html?search=1&freeshipping=1")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[class$='pdp-review-summary__link']"))).click()
ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, "//h2[contains(text(), 'Ratings & Reviews')]")))).perform()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.item-content")))
for review in driver.find_elements(By.CSS_SELECTOR, "div.item-content"):
    print(review.get_attribute('innerHTML'))

Imports:进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM