Scrapy既不顯示任何錯誤也不獲取任何數據

Question

嘗試使用scrapy從站點解析產品名稱和價格。 但是，當我運行我的scrapy代碼時，它既不會顯示任何錯誤，也不會獲取任何數據。 我做錯了，超出了我發現的能力。 希望有人來看看它。

“ items.py”包括：

import scrapy
class SephoraItem(scrapy.Item):
    Name = scrapy.Field()
    Price = scrapy.Field()

名為“ sephorasp.py”的蜘蛛文件包含：

from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.linkextractors import LinkExtractor

class SephoraspSpider(CrawlSpider):
    name = "sephorasp"
    allowed_domains = ['sephora.ae']
    start_urls = ["https://www.sephora.ae/en/stores/"]
    rules = [
            Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd  parent"]')),
            Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),
            callback="parse_item")
    ]

    def parse_item(self, response):
        page = response.xpath('//div[@class="product-info"]')
        for titles in page:
            Product = titles.xpath('.//a[@title]/text()').extract()
            Rate = titles.xpath('.//span[@class="price"]/text()').extract()
            yield {'Name':Product,'Price':Rate}

這是指向日志的鏈接：“ https://www.dropbox.com/s/8xktgh7lvj4uhbh/output.log?dl=0 ”

當我玩BaseSpider時，它可以工作：

from scrapy.spider import BaseSpider
from scrapy.http.request import Request

class SephoraspSpider(BaseSpider):
    name = "sephorasp"
    allowed_domains = ['sephora.ae']
    start_urls = [
                    "https://www.sephora.ae/en/travel-size/make-up",
                    "https://www.sephora.ae/en/perfume/women-perfume",
                    "https://www.sephora.ae/en/makeup/eye/eyeshadow",
                    "https://www.sephora.ae/en/skincare/moisturizers",
                    "https://www.sephora.ae/en/gifts/palettes"

    ]

    def pro(self, response):
        item_links = response.xpath('//a[contains(@class,"level0")]/@href').extract()
        for a in item_links:
            yield Request(a, callback = self.end)

    def end(self, response):
        item_link = response.xpath('//a[@class="level2"]/@href').extract()
        for b in item_link:
            yield Request(b, callback = self.parse)

    def parse(self, response):
        page = response.xpath('//div[@class="product-info"]')
        for titles in page:
            Product= titles.xpath('.//a[@title]/text()').extract()
            Rate= titles.xpath('.//span[@class="price"]/text()').extract()
            yield {'Name':Product,'Price':Rate}

Answer 1

您的xpath存在嚴重缺陷。

Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd  parent"]')),
Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),

您正在匹配整個類范圍，該范圍可以隨時更改，並且順序可能會有所不同。 僅選擇一個班級，它很可能很獨特：

Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level0")]')),
Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level2")]')),

Scrapy既不顯示任何錯誤也不獲取任何數據

問題描述

1 個解決方案

解決方案1
1 2017-04-19 06:44:14

Scrapy既不顯示任何錯誤也不獲取任何數據

問題描述

1 個解決方案

解決方案1 1 2017-04-19 06:44:14

解決方案1
1 2017-04-19 06:44:14