简体   繁体   中英

Scrapy CSV column export

I´d like to export data to several columns in csv but I always obtain this kind of file:

csv

I´d like to obtain two columns one "articulo" and another one "price"

My pipelines:

import scrapy
from scrapy import signals
from scrapy.contrib.exporter import CsvItemExporter
import csv

class MercadoPipeline(object):
    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline

        def spider_opened(self, spider):
            file = open('%s_items.csv' % spider.name, 'w+b')
            self.files[spider] = file 
            self.exporter = CsvItemexporter(file)
            self.exporter.fields_to_export = ['articulo','precio']
            self.exporter.start_exporting()


        def spider_closed(self, spider):
            self.exporter.finish_exporting()
            file = self.files.pop(spider)
            file.closed()

        def process_item(self, item, spider):
            self.exporter.export_item(item)
            return item

Can you help me please?

Here you are:

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.exceptions import CloseSpider
from mercado.items import MercadoItem


class MercadoSpider(CrawlSpider):
name = 'mercado'
item_count = 0
allowed_domain = ['www.autodoc.es']
start_urls = ['https://www.autodoc.es/search?brandNo%5B0%5D=101']

rules = {

    Rule(LinkExtractor(allow =(), restrict_xpaths = ('//span[@class="next"]/a'))),
    Rule(LinkExtractor(allow =(), restrict_xpaths = ('//a[@class="ga-click"]')),
                    callback = 'parse_item', follow = False)
}


def parse_item(self, response):
    ml_item = MercadoItem()

    #info de producto
    ml_item['articulo'] = response.xpath('normalize-space(//*[@id="content"]/div[4]/div[2]/div[1]/div[1]/div/span[1]/span/text())').extract()
    ml_item['precio'] = response.xpath('normalize-space(//*[@id="content"]/div[4]/div[3]/div[2]/p[2]/text())').extract()
    self.item_count += 1
    if self.item_count > 20:
        raise CloseSpider('item_exceeded')
    yield ml_item   

There is nothing wrong with the output of your code.
You are getting the two csv columns you want, but the program you are using to view the data is not interpreting it correctly.

By default, CsvItemExporter uses , as the delimiter, and the program seems to expect something else (and possibly even different quoting).
There are two possibilities to solve your problem:

  • Change the program's settings so it reads the file correctly
  • Change the way CsvItemExporter exports data (it will pass any additional keyword arguments to the underlying csv.writer object)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM