Scrapy CSV列导出

Question

I´d like to export data to several columns in csv but I always obtain this kind of file: 我想将数据导出到csv中的几列中，但是我总是会得到这种文件：

csv CSV

I´d like to obtain two columns one "articulo" and another one "price" 我想获得两列，一列为“ articulo”，另一列为“ price”

My pipelines: 我的管道：

import scrapy
from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter
import csv

class MercadoPipeline(object):
    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline

        def spider_opened(self, spider):
            file = open('%s_items.csv' % spider.name, 'w+b')
            self.files[spider] = file 
            self.exporter = CsvItemexporter(file)
            self.exporter.fields_to_export = ['articulo','precio']
            self.exporter.start_exporting()


        def spider_closed(self, spider):
            self.exporter.finish_exporting()
            file = self.files.pop(spider)
            file.closed()

        def process_item(self, item, spider):
            self.exporter.export_item(item)
            return item

Can you help me please? 你能帮我吗？

Answer 1

Here you are: 这个给你：

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.exceptions import CloseSpider
from mercado.items import MercadoItem


class MercadoSpider(CrawlSpider):
name = 'mercado'
item_count = 0
allowed_domain = ['www.autodoc.es']
start_urls = ['https://www.autodoc.es/search?brandNo%5B0%5D=101']

rules = {

    Rule(LinkExtractor(allow =(), restrict_xpaths = ('//span[@class="next"]/a'))),
    Rule(LinkExtractor(allow =(), restrict_xpaths = ('//a[@class="ga-click"]')),
                    callback = 'parse_item', follow = False)
}


def parse_item(self, response):
    ml_item = MercadoItem()

    #info de producto
    ml_item['articulo'] = response.xpath('normalize-space(//*[@id="content"]/div[4]/div[2]/div[1]/div[1]/div/span[1]/span/text())').extract()
    ml_item['precio'] = response.xpath('normalize-space(//*[@id="content"]/div[4]/div[3]/div[2]/p[2]/text())').extract()
    self.item_count += 1
    if self.item_count > 20:
        raise CloseSpider('item_exceeded')
    yield ml_item

Answer 2

There is nothing wrong with the output of your code. 代码的输出没有错。
You are getting the two csv columns you want, but the program you are using to view the data is not interpreting it correctly. 您将获得所需的两个csv列，但是用于查看数据的程序无法正确解释它。

By default, CsvItemExporter uses , as the delimiter, and the program seems to expect something else (and possibly even different quoting). 默认情况下， CsvItemExporter使用,作为定界符，程序似乎期望其他内容（甚至可能使用不同的引用）。
There are two possibilities to solve your problem: 有两种方法可以解决您的问题：

Change the program's settings so it reads the file correctly 更改程序的设置，以便它可以正确读取文件
Change the way CsvItemExporter exports data (it will pass any additional keyword arguments to the underlying csv.writer object) 更改CsvItemExporter导出数据的方式（它将任何其他关键字参数传递给基础csv.writer对象）

Scrapy CSV列导出

问题描述

2 个解决方案

解决方案1
0 2018-02-27 06:27:23

解决方案2
0 2018-02-27 09:38:36

Scrapy CSV列导出

问题描述

2 个解决方案

解决方案1 0 2018-02-27 06:27:23

解决方案2 0 2018-02-27 09:38:36

解决方案1
0 2018-02-27 06:27:23

解决方案2
0 2018-02-27 09:38:36