[英]Scrapy neither shows any error nor fetches any data
嘗試使用scrapy從站點解析產品名稱和價格。 但是,當我運行我的scrapy代碼時,它既不會顯示任何錯誤,也不會獲取任何數據。 我做錯了,超出了我發現的能力。 希望有人來看看它。
“ items.py”包括:
import scrapy
class SephoraItem(scrapy.Item):
Name = scrapy.Field()
Price = scrapy.Field()
名為“ sephorasp.py”的蜘蛛文件包含:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class SephoraspSpider(CrawlSpider):
name = "sephorasp"
allowed_domains = ['sephora.ae']
start_urls = ["https://www.sephora.ae/en/stores/"]
rules = [
Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd parent"]')),
Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),
callback="parse_item")
]
def parse_item(self, response):
page = response.xpath('//div[@class="product-info"]')
for titles in page:
Product = titles.xpath('.//a[@title]/text()').extract()
Rate = titles.xpath('.//span[@class="price"]/text()').extract()
yield {'Name':Product,'Price':Rate}
這是指向日志的鏈接:“ https://www.dropbox.com/s/8xktgh7lvj4uhbh/output.log?dl=0 ”
當我玩BaseSpider時,它可以工作:
from scrapy.spider import BaseSpider
from scrapy.http.request import Request
class SephoraspSpider(BaseSpider):
name = "sephorasp"
allowed_domains = ['sephora.ae']
start_urls = [
"https://www.sephora.ae/en/travel-size/make-up",
"https://www.sephora.ae/en/perfume/women-perfume",
"https://www.sephora.ae/en/makeup/eye/eyeshadow",
"https://www.sephora.ae/en/skincare/moisturizers",
"https://www.sephora.ae/en/gifts/palettes"
]
def pro(self, response):
item_links = response.xpath('//a[contains(@class,"level0")]/@href').extract()
for a in item_links:
yield Request(a, callback = self.end)
def end(self, response):
item_link = response.xpath('//a[@class="level2"]/@href').extract()
for b in item_link:
yield Request(b, callback = self.parse)
def parse(self, response):
page = response.xpath('//div[@class="product-info"]')
for titles in page:
Product= titles.xpath('.//a[@title]/text()').extract()
Rate= titles.xpath('.//span[@class="price"]/text()').extract()
yield {'Name':Product,'Price':Rate}
您的xpath存在嚴重缺陷。
Rule(LinkExtractor(restrict_xpaths='//li[@class="level0 nav-1 active first touch-dd parent"]')),
Rule(LinkExtractor(restrict_xpaths='//li[@class="level2 nav-1-1-1 active first"]'),
您正在匹配整個類范圍,該范圍可以隨時更改,並且順序可能會有所不同。 僅選擇一個班級,它很可能很獨特:
Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level0")]')),
Rule(LinkExtractor(restrict_xpaths='//li[contains(@class,"level2")]')),
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.