簡體   English   中英

Scrapy Bestbuy 不提取數據

[英]Scrapy Bestbuy not extracting data

我想知道為什么scrapy 不在bestbuy 網站上提取數據。 我的代碼有什么問題嗎?

import scrapy

class QuotesSpider(scrapy.Spider):
    name = 'bestbuy'
    start_url = ['https://www.bestbuy.com/site/promo/newly-discounted-outlet-products']

    def parse(self, response):
        title = response.css('div.sku-title a::text').extract()
        yield title

這是我使用scrapy crawl bestbuy -o bestbuy.csv 時的結果

2020-02-10 06:04:22 [scrapy.core.engine] INFO: Spider opened
2020-02-10 06:04:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-02-10 06:04:22 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-02-10 06:04:22 [scrapy.core.engine] INFO: Closing spider (finished)
2020-02-10 06:04:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.017988,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 2, 10, 12, 4, 22, 251711),
 'log_count/INFO': 10,
 'start_time': datetime.datetime(2020, 2, 10, 12, 4, 22, 233723)}
2020-02-10 06:04:22 [scrapy.core.engine] INFO: Spider closed (finished)

它在 shell 中工作而不是在您的代碼中工作的原因是因為您忘記了 'start_urls' 末尾的 's'。

這應該有效:

import scrapy


class QuotesSpider(scrapy.Spider):
    name = 'bestbuy'
    start_urls = [
        'https://www.bestbuy.com/site/promo/newly-discounted-outlet-products']

    def parse(self, response):
        for title in response.css('h4 > a::text').getall():
            yield {"title": title}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM