Scrapy：返回空列表

Question

所以，我試圖創建一個 olx刮刀並且遇到了一個問題，我從 shell 得到了可能的響應，但在管道陣列中沒有得到任何東西

我的刮刀

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from olx.items import OlxItem


class ElectronicsSpider(CrawlSpider):
    name = "electronics"
    allowed_domains = ["www.olx.in"]
    start_urls = [
        'https://www.olx.in/computers-accessories/',
        'https://www.olx.in/tv-video-audio/',
        'https://www.olx.in/games-entertainment/'
    ]

    rules = (
        Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
             callback="parse_item",
             follow=True),)

    def parse_item(self, response):
        item_links = response.css('.large > .detailsLink::attr(href)').extract()
        for a in item_links:
            yield scrapy.Request(a, callback=self.parse_detail_page)

    def parse_detail_page(self, response):
        title = response.css('h1::text').extract()[0].strip()
        price = response.css('.pricelabel > strong::text').extract()[0]

        item = OlxItem()
        item['title'] = title
        item['price'] = price
        item['url'] = response.url
        yield item

沒有日志的響應命令： scrapy crawl --nolog electronics根本就沒有，這表明當看到的日志是這樣的時，項目管道中正在傳遞一個空列表

...
2020-07-14 18:43:43 [scrapy.middleware] INFO: Enabled item pipelines:
[]
...

Answer 1

我已將 parse_detail_page 更改為 parse.please 請參閱文檔

    import scrapy
    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor
    # from olx.items import OlxItem
    
    
    class ElectronicsSpider(CrawlSpider):
        name = "electronics"
        allowed_domains = ["www.olx.in"]
        start_urls = [
            'https://www.olx.in/computers-accessories/',
            'https://www.olx.in/tv-video-audio/',
            'https://www.olx.in/games-entertainment/'
        ]
    
        rules = (
            Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
                 callback="parse_item",
                 follow=True),)
    
        def parse_item(self, response):
            item_links = response.css('.large > .detailsLink::attr(href)').extract()
            for a in item_links:
                yield scrapy.Request(a)
    
        def parse(self, response):
            title = response.css('span._2tW1I::text').extract()[0].strip()
            price = response.css('span._89yzn::text').extract()[0]
            print()
            print()
            yield {
                'title': title,
                'price': price,
                'url': response.url
            }
Output here

    {'title': 'Fiber splicing machine', 'price': '₹ 1,55,000', 'url': 'https://www.olx.in/computers-laptops_c1505', 'field_name_for_your_processed_files': []}
    2020-07-14 22:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.in/tv-video-audio_c1523> (referer: None)
    
    
    2020-07-14 22:04:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.olx.in/tv-video-audio_c1523>
    {'title': 'Digidesign 003 sound card', 'price': '₹ 35,000', 'url': 'https://www.olx.in/tv-video-audio_c1523', 'field_name_for_your_processed_files': []}
    2020-07-14 22:04:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.in/games-entertainment_c93> (referer: None)
    
    
    2020-07-14 22:04:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.olx.in/games-entertainment_c93>
    {'title': 'ps4,ps3,xbox sales and services(S-GAMERSHOP)', 'price': '₹ 19,000', 'url': 'https://www.olx.in/games-entertainment_c93', 'field_name_for_your_processed_files': []}

Scrapy：返回空列表

問題描述

1 個解決方案

解決方案1
0 2020-07-14 16:11:45

Scrapy：返回空列表

問題描述

1 個解決方案

解決方案1 0 2020-07-14 16:11:45

解決方案1
0 2020-07-14 16:11:45