简体   繁体   中英

Organizing csv export with scrapy

For exporting my data to a CSV file, I'm currently using (mainly because I never understood pipelines that well):

custom_settings = {

    'FEED_FORMAT': 'csv',
    'FEED_URI' : 'datosAmazon.csv'

}

This custom settings are inside my spider.

Right now, I'm scraping different categories of items, for example, laptops and cell phones.

Problem is that, when I go check out my data, things are not organized, maybe a laptop appears, then a cell phone, then 2 laptops, cellphone and so on.

I'm currently going into different categories this way

def start_requests(self):

    keywords = ['laptop', 'cellphone']

    for keyword in keywords:

        yield Request(self.search_url.format(keyword))

Is it there a way for the data to be more organized (2 files would be even better), or an easy pipeline solution.

There is no settings-only way to achieve what you want.

That said, exporting to multiple files from a custom pipeline is pretty straight-forward:

  • Create multiple exporters ( scrapy.exporters.CSVItemExporter ) in the open_spider method (probably store them in a dict)
  • Select the correct exporter (based on the item) in the process_item method and call its export_item
  • Close the files in the close_spider method

Don't forget to activate your pipeline :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM