[英]Scrapy Bestbuy not extracting data
我想知道為什么scrapy 不在bestbuy 網站上提取數據。 我的代碼有什么問題嗎?
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'bestbuy'
start_url = ['https://www.bestbuy.com/site/promo/newly-discounted-outlet-products']
def parse(self, response):
title = response.css('div.sku-title a::text').extract()
yield title
這是我使用scrapy crawl bestbuy -o bestbuy.csv 時的結果
2020-02-10 06:04:22 [scrapy.core.engine] INFO: Spider opened
2020-02-10 06:04:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-02-10 06:04:22 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-02-10 06:04:22 [scrapy.core.engine] INFO: Closing spider (finished)
2020-02-10 06:04:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.017988,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 2, 10, 12, 4, 22, 251711),
'log_count/INFO': 10,
'start_time': datetime.datetime(2020, 2, 10, 12, 4, 22, 233723)}
2020-02-10 06:04:22 [scrapy.core.engine] INFO: Spider closed (finished)
它在 shell 中工作而不是在您的代碼中工作的原因是因為您忘記了 'start_urls' 末尾的 's'。
這應該有效:
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'bestbuy'
start_urls = [
'https://www.bestbuy.com/site/promo/newly-discounted-outlet-products']
def parse(self, response):
for title in response.css('h4 > a::text').getall():
yield {"title": title}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.