[英]Scrapy: returns empty list
所以,我試圖創建一個 olx刮刀並且遇到了一個問題,我從 shell 得到了可能的響應,但在管道陣列中沒有得到任何東西
我的刮刀
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from olx.items import OlxItem
class ElectronicsSpider(CrawlSpider):
name = "electronics"
allowed_domains = ["www.olx.in"]
start_urls = [
'https://www.olx.in/computers-accessories/',
'https://www.olx.in/tv-video-audio/',
'https://www.olx.in/games-entertainment/'
]
rules = (
Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
callback="parse_item",
follow=True),)
def parse_item(self, response):
item_links = response.css('.large > .detailsLink::attr(href)').extract()
for a in item_links:
yield scrapy.Request(a, callback=self.parse_detail_page)
def parse_detail_page(self, response):
title = response.css('h1::text').extract()[0].strip()
price = response.css('.pricelabel > strong::text').extract()[0]
item = OlxItem()
item['title'] = title
item['price'] = price
item['url'] = response.url
yield item
沒有日志的響應命令: scrapy crawl --nolog electronics根本就沒有,這表明當看到的日志是這樣的時,項目管道中正在傳遞一個空列表
...
2020-07-14 18:43:43 [scrapy.middleware] INFO: Enabled item pipelines:
[]
...
我已將 parse_detail_page 更改為 parse.please 請參閱 文檔
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
# from olx.items import OlxItem
class ElectronicsSpider(CrawlSpider):
name = "electronics"
allowed_domains = ["www.olx.in"]
start_urls = [
'https://www.olx.in/computers-accessories/',
'https://www.olx.in/tv-video-audio/',
'https://www.olx.in/games-entertainment/'
]
rules = (
Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
callback="parse_item",
follow=True),)
def parse_item(self, response):
item_links = response.css('.large > .detailsLink::attr(href)').extract()
for a in item_links:
yield scrapy.Request(a)
def parse(self, response):
title = response.css('span._2tW1I::text').extract()[0].strip()
price = response.css('span._89yzn::text').extract()[0]
print()
print()
yield {
'title': title,
'price': price,
'url': response.url
}
Output here
{'title': 'Fiber splicing machine', 'price': '₹ 1,55,000', 'url': 'https://www.olx.in/computers-laptops_c1505', 'field_name_for_your_processed_files': []}
2020-07-14 22:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.in/tv-video-audio_c1523> (referer: None)
2020-07-14 22:04:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.olx.in/tv-video-audio_c1523>
{'title': 'Digidesign 003 sound card', 'price': '₹ 35,000', 'url': 'https://www.olx.in/tv-video-audio_c1523', 'field_name_for_your_processed_files': []}
2020-07-14 22:04:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.in/games-entertainment_c93> (referer: None)
2020-07-14 22:04:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.olx.in/games-entertainment_c93>
{'title': 'ps4,ps3,xbox sales and services(S-GAMERSHOP)', 'price': '₹ 19,000', 'url': 'https://www.olx.in/games-entertainment_c93', 'field_name_for_your_processed_files': []}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.