Scrapy 推出空 JSON / CSV 文件

Question

我對 scrapy 和 python 非常陌生，真的可以提供一些幫助。 我有這個代碼可以在命令行中工作。 我可以看到它在瀏覽不同頁面時提取了所有正確的信息。

我的問題是，當我嘗試將腳本的 output 保存到文件中時，它顯示為空。 我在這里查看了很多其他問題，但找不到任何幫助。

這是代碼

import scrapy
from urlparse import urljoin


class Aberdeenlocations1Spider(scrapy.Spider):
    name = "aberdeenlocations2"
    start_urls = [
        'http://brighthouse.co.uk/store-finder/all-stores',
    ]

    def parse(self, response):
        products = response.xpath('//ul/li/a/@href').extract()
        for p in products:
            url = urljoin(response.url, p)
            yield scrapy.Request(url, callback=self.parse_product)

    def parse_product(self, response):
        for div in response.css('div'):
          yield {
               title: (response.css('title::text').extract()),
               address: (response.css('[itemprop=streetAddress]::text').extract()),
               locality: (response.css('[itemprop=addressLocality]::text').extract()),
               region: (response.css('[itemprop=addressRegion]::text').extract()),
               postcode: (response.css('[itemprop=postalCode]::text').extract()),
               telephone: (response.css('[itemprop=telephone]::text').extract()),
               script: (response.xpath('//div/script').extract()),
               gmaplink: (response.xpath('//div/div/div/p/a/@href').extract_first())
                  }

然后我在上面的腳本上運行這個命令

scrapy crawl aberdeenlocations2 -o data.json

我究竟做錯了什么？

Answer 1

我認為您的產量中只有一些 python 錯誤。 像這樣我在 output 中得到一些數據：

import scrapy
from urlparse import urljoin


class Aberdeenlocations1Spider(scrapy.Spider):
    name = "aberdeenlocations2"
    start_urls = [
        'http://brighthouse.co.uk/store-finder/all-stores',
    ]

    def parse(self, response):
        products = response.xpath('//ul/li/a/@href').extract()
        for p in products:
            url = urljoin(response.url, p)
            yield scrapy.Request(url, callback=self.parse_product)

    def parse_product(self, response):
        # not sure why this loop is there
        for div in response.css('div'):
          yield {
               'title': response.css('title::text').extract(),
               'address': response.css('[itemprop=streetAddress]::text').extract(),
               'locality': response.css('[itemprop=addressLocality]::text').extract(),
               'region': response.css('[itemprop=addressRegion]::text').extract(),
               'postcode': response.css('[itemprop=postalCode]::text').extract(),
               'telephone': response.css('[itemprop=telephone]::text').extract(),
               'script': response.xpath('//div/script').extract(),
               'gmaplink': response.xpath('//div/div/div/p/a/@href').extract_first()
                  }

Scrapy 推出空 JSON / CSV 文件

問題描述

1 個解決方案

解決方案1
0 已采納 2020-04-30 07:07:44

Scrapy 推出空 JSON / CSV 文件

問題描述

1 個解決方案

解決方案1 0 已采納 2020-04-30 07:07:44

解決方案1
0 已采納 2020-04-30 07:07:44