简体   繁体   English

Scrapy 不解析来自 Selenium 的 page_source

[英]Scrapy not parsing page_source from Selenium

Im trying to create a spider with Selenium that would search for merchants in https://www.trustpilot.com and then retrieve tree rating/score from the search.我试图用 Selenium 创建一个蜘蛛,它会在https://www.trustpilot.com 中搜索商家,然后从搜索中检索树评级/分数。 Since there are many merchants to find I have created a list which Seleniuem loops through and then stores the page_source in a list.由于有很多商家可以找到,因此我创建了一个列表, Selenuem 循环遍历该列表,然后将 page_source 存储在列表中。 Idea is that this list of page_source's should then be parsed by Scrapy and return a .json file with merchant ratings.想法是,这个 page_source 列表应该被 Scrapy 解析并返回一个带有商家评级的 .json 文件。 After running the spider I see that the result is 0 pages crawled and an empty .json file.运行蜘蛛后,我看到结果是抓取了 0 个页面和一个空的 .json 文件。 Cant seem to figure out why nothing is parsed.似乎无法弄清楚为什么没有解析任何内容。 here is my code:-这是我的代码:-

# -*- coding: utf-8 -*-
import scrapy
from scrapy import Selector
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from shutil import which

queries = ['yeewtuden.com', '1a.lv', 'grishkoshop.com']


class SeleniumTestSpider(scrapy.Spider):
    name = 'selenium_test'
    allowed_domains = ['www.trustpilot.com']
    start_urls = ["www.trustpilot.com"]

    page_responses = []

    def __init__(self):
        super().__init__()
        chrome_options = Options()
        chrome_options.add_argument("--headless")

        chrome_path = which("chromedriver")

        driver = webdriver.Chrome(executable_path=chrome_path, options=chrome_options)
        driver.implicitly_wait(10)
        driver.get("https://www.trustpilot.com")

        # search_field = driver.find_element_by_xpath("//input[@class='searchInputField___3e9zp']")

        for query in queries:
            search_field = WebDriverWait(driver, 7).until(EC.presence_of_element_located((
                By.CLASS_NAME, 'searchInputField___3e9zp')))
            search_field = driver.find_element_by_xpath("//input[@class='searchInputField___3e9zp']")
            search_field.send_keys(query)
            search_field.send_keys(Keys.ENTER)
            self.page_responses.append(driver.page_source)
            driver.back()

        driver.close()

    def parse(self, response):
        for resp in self.page_responses:
            resp = Selector(text=resp)
            score = resp.xpath("//p[@class='header_trustscore']/text()").get()
            yield {
                'rating': score
            }

You can use below code which will return rating.您可以使用以下代码返回评分。

Create object of class and create a generator which will be used to get the rating.创建类的对象并创建一个生成器,用于获取评级。

testSpider = SeleniumTestSpider()
parseGenerator = testSpider.parse(testSpider.page_responses)
for i in parseGenerator:
    print(i, end=" ")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM