简体   繁体   English

Scrapy CSV列导出

[英]Scrapy CSV column export

I´d like to export data to several columns in csv but I always obtain this kind of file: 我想将数据导出到csv中的几列中,但是我总是会得到这种文件:

csv CSV

I´d like to obtain two columns one "articulo" and another one "price" 我想获得两列,一列为“ articulo”,另一列为“ price”

My pipelines: 我的管道:

import scrapy
from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter
import csv

class MercadoPipeline(object):
    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline

        def spider_opened(self, spider):
            file = open('%s_items.csv' % spider.name, 'w+b')
            self.files[spider] = file 
            self.exporter = CsvItemexporter(file)
            self.exporter.fields_to_export = ['articulo','precio']
            self.exporter.start_exporting()


        def spider_closed(self, spider):
            self.exporter.finish_exporting()
            file = self.files.pop(spider)
            file.closed()

        def process_item(self, item, spider):
            self.exporter.export_item(item)
            return item

Can you help me please? 你能帮我吗?

Here you are: 这个给你:

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.exceptions import CloseSpider
from mercado.items import MercadoItem


class MercadoSpider(CrawlSpider):
name = 'mercado'
item_count = 0
allowed_domain = ['www.autodoc.es']
start_urls = ['https://www.autodoc.es/search?brandNo%5B0%5D=101']

rules = {

    Rule(LinkExtractor(allow =(), restrict_xpaths = ('//span[@class="next"]/a'))),
    Rule(LinkExtractor(allow =(), restrict_xpaths = ('//a[@class="ga-click"]')),
                    callback = 'parse_item', follow = False)
}


def parse_item(self, response):
    ml_item = MercadoItem()

    #info de producto
    ml_item['articulo'] = response.xpath('normalize-space(//*[@id="content"]/div[4]/div[2]/div[1]/div[1]/div/span[1]/span/text())').extract()
    ml_item['precio'] = response.xpath('normalize-space(//*[@id="content"]/div[4]/div[3]/div[2]/p[2]/text())').extract()
    self.item_count += 1
    if self.item_count > 20:
        raise CloseSpider('item_exceeded')
    yield ml_item   

There is nothing wrong with the output of your code. 代码的输出没有错。
You are getting the two csv columns you want, but the program you are using to view the data is not interpreting it correctly. 获得所需的两个csv列,但是用于查看数据的程序无法正确解释它。

By default, CsvItemExporter uses , as the delimiter, and the program seems to expect something else (and possibly even different quoting). 默认情况下, CsvItemExporter使用,作为定界符,程序似乎期望其他内容(甚至可能使用不同的引用)。
There are two possibilities to solve your problem: 有两种方法可以解决您的问题:

  • Change the program's settings so it reads the file correctly 更改程序的设置,以便它可以正确读取文件
  • Change the way CsvItemExporter exports data (it will pass any additional keyword arguments to the underlying csv.writer object) 更改CsvItemExporter导出数据的方式(它将任何其他关键字参数传递给基础csv.writer对象)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM