簡體   English   中英

Scrapy:返回空列表

[英]Scrapy: returns empty list

所以,我試圖創建一個 olx刮刀並且遇到了一個問題,我從 shell 得到了可能的響應,但在管道陣列中沒有得到任何東西

我的刮刀

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from olx.items import OlxItem


class ElectronicsSpider(CrawlSpider):
    name = "electronics"
    allowed_domains = ["www.olx.in"]
    start_urls = [
        'https://www.olx.in/computers-accessories/',
        'https://www.olx.in/tv-video-audio/',
        'https://www.olx.in/games-entertainment/'
    ]

    rules = (
        Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
             callback="parse_item",
             follow=True),)

    def parse_item(self, response):
        item_links = response.css('.large > .detailsLink::attr(href)').extract()
        for a in item_links:
            yield scrapy.Request(a, callback=self.parse_detail_page)

    def parse_detail_page(self, response):
        title = response.css('h1::text').extract()[0].strip()
        price = response.css('.pricelabel > strong::text').extract()[0]

        item = OlxItem()
        item['title'] = title
        item['price'] = price
        item['url'] = response.url
        yield item

沒有日志的響應命令: scrapy crawl --nolog electronics根本就沒有,這表明當看到的日志是這樣的時,項目管道中正在傳遞一個空列表

...
2020-07-14 18:43:43 [scrapy.middleware] INFO: Enabled item pipelines:
[]
...

我已將 parse_detail_page 更改為 parse.please 請參閱 文檔

    import scrapy
    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor
    # from olx.items import OlxItem
    
    
    class ElectronicsSpider(CrawlSpider):
        name = "electronics"
        allowed_domains = ["www.olx.in"]
        start_urls = [
            'https://www.olx.in/computers-accessories/',
            'https://www.olx.in/tv-video-audio/',
            'https://www.olx.in/games-entertainment/'
        ]
    
        rules = (
            Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
                 callback="parse_item",
                 follow=True),)
    
        def parse_item(self, response):
            item_links = response.css('.large > .detailsLink::attr(href)').extract()
            for a in item_links:
                yield scrapy.Request(a)
    
        def parse(self, response):
            title = response.css('span._2tW1I::text').extract()[0].strip()
            price = response.css('span._89yzn::text').extract()[0]
            print()
            print()
            yield {
                'title': title,
                'price': price,
                'url': response.url
            }
Output here

    {'title': 'Fiber splicing machine', 'price': '₹ 1,55,000', 'url': 'https://www.olx.in/computers-laptops_c1505', 'field_name_for_your_processed_files': []}
    2020-07-14 22:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.in/tv-video-audio_c1523> (referer: None)
    
    
    2020-07-14 22:04:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.olx.in/tv-video-audio_c1523>
    {'title': 'Digidesign 003 sound card', 'price': '₹ 35,000', 'url': 'https://www.olx.in/tv-video-audio_c1523', 'field_name_for_your_processed_files': []}
    2020-07-14 22:04:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.olx.in/games-entertainment_c93> (referer: None)
    
    
    2020-07-14 22:04:30 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.olx.in/games-entertainment_c93>
    {'title': 'ps4,ps3,xbox sales and services(S-GAMERSHOP)', 'price': '₹ 19,000', 'url': 'https://www.olx.in/games-entertainment_c93', 'field_name_for_your_processed_files': []}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM