将抓取的结果保存到CSV文件中

Question

I'm having some problems with the web crawler I wrote. 我编写的Web搜寻器存在一些问题。 I want to save the data that I fetch. 我想保存获取的数据。 If I understood right from the scrapy tutorial I just need to yield it and then start the crawler by using scrapy crawl <crawler> -o file.csv -t csv right? 如果我从scrapy教程中正确地理解了，我只需要屈服它，然后通过使用scrapy crawl <crawler> -o file.csv -t csv来启动scrapy crawl <crawler> -o file.csv -t csv对吗？ For some reason the file remains empty. 由于某种原因，文件保持为空。 Here's my code: 这是我的代码：

# -*- coding: utf-8 -*-
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class PaginebiancheSpider(CrawlSpider):
name = 'paginebianche'
allowed_domains = ['paginebianche.it']
start_urls = ['https://www.paginebianche.it/aziende-clienti/lombardia/milano/comuni.htm']

rules = (
    Rule(LinkExtractor(allow=(), restrict_css = ('.seo-list-name','.seo-list-name-up')),
         callback = "parse_item",
         follow = True),)

def parse_item(self, response):
    if(response.xpath("//h2[@class='rgs']//strong//text()") != [] and response.xpath("//span[@class='value'][@itemprop='telephone']//text()") != []):
        yield ' '.join(response.xpath("//h2[@class='rgs']//strong//text()").extract()) + " " + response.xpath("//span[@class='value'][@itemprop='telephone']//text()").extract()[0].strip(),

I'm using python 2.7 我正在使用python 2.7

Answer 1

If you look at the spider's output, you will see a bunch of error messages like this one being logged: 如果查看Spider的输出，将会看到一堆类似以下错误消息的记录：

2018-10-20 13:47:52 [scrapy.core.scraper] ERROR: Spider must return Request, BaseItem, dict or None, got 'tuple' in <GET https://www.paginebianche.it/lombardia/abbiategrasso/vivai-padovani.html>

What this means is that you're not yielding the correct thing - you need dicts or Item s, and not the single-item tuples you're creating. 这意味着您没有得到正确的结果-您需要dict或Item ，而不是要创建的单项元组。
Something as simple as this should work: 像这样简单的事情应该起作用：

yield {
    'name': response.xpath("normalize-space(//h2[@class='rgs'])").get(),
    'phone': response.xpath("//span[@itemprop='telephone']/text()").get()
}

将抓取的结果保存到CSV文件中

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-10-20 11:58:31

将抓取的结果保存到CSV文件中

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-10-20 11:58:31

解决方案1
1 已采纳 2018-10-20 11:58:31