简体   繁体   中英

writing data into multiple sheets in a single csv file from python/scrapy(python framework)

I am using scrapy framework and fetched data from two urls by creating two spider files.

Now for example when i run the spider1 for url1 the data scraped will be saved in to csv1 file, when i run the second spider2 the data will be saved in to csv2 file.

Actually what i am trying to do is saving all the data from different spiders in to a single csv file in different sheets(sheet name should be as spider name)

All about my question is how to write data in to multiple sheets in a single csv file from python

pipeline.py

from w3c_browser.items import WCBrowserItem
import csv
from csv import DictWriter
from cStringIO import StringIO
from datetime import datetime
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy import log

class W3CBrowserPipeline(object):
    def __init__(self):
        dispatcher.connect(self.spider_opened, signal=signals.spider_opened)
        dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
        self.brandCategoryCsv = csv.writer(open('wcbbrowser.csv', 'wb'))

    def spider_opened(self, spider):
        spider.started_on = datetime.now()
        if spider.name == 'browser_statistics':
            log.msg("opened spider  %s at time %s" % (spider.name,datetime.now().strftime('%H-%M-%S')))
            self.brandCategoryCsv = csv.writer(open("csv/%s-%s.csv"% (spider.name,datetime.now().strftime('%d%m%y')), "wb"),
                       delimiter=',', quoting=csv.QUOTE_MINIMAL)
        elif spider.name == 'browser_os':
            log.msg("opened spider  %s at time %s" % (spider.name,datetime.now().strftime('%H-%M-%S')))
            self.brandCategoryCsv = csv.writer(open("csv/%s-%s.csv"% (spider.name,datetime.now().strftime('%d%m%y')), "wb"),
                       delimiter=',', quoting=csv.QUOTE_MINIMAL)
        elif spider.name == 'browser_display':
            log.msg("opened spider  %s at time %s" % (spider.name,datetime.now().strftime('%H-%M-%S')))
            self.brandCategoryCsv = csv.writer(open("csv/%s-%s.csv"% (spider.name,datetime.now().strftime('%d%m%y')), "wb"),
                       delimiter=',', quoting=csv.QUOTE_MINIMAL)

    def process_item(self, item, spider):
        if spider.name == 'browser_statistics':
            self.brandCategoryCsv.writerow([item['year'],
                                            item['internet_explorer'],
                                            item['firefox'],
                                            item['chrome'],
                                            item['safari'],
                                            item['opera'],
            ])
            return item

        elif spider.name == 'browser_os':
            def process_item(self, item, spider):
                self.brandCategoryCsv.writerow([item['year'],
                                                item['vista'],
                                                item['nt'],
                                                item['winxp'],
                                                item['linux'],
                                                item['mac'],
                                                item['mobile'],
                ])
                return item

    def spider_closed(self, spider):
        log.msg("closed spider %s at %s" % (spider.name,datetime.now().strftime('%H-%M-%S')))
        work_time = datetime.now() - spider.started_on
        print str(work_time),"Total Time taken by the spider to run>>>>>>>>>>>"

I don't know if there is a nifty built in way to do this using scrapy from the command line. But it is pretty simple to create your own pipeline . The pipeline could open the same file for all your spiders and write a different sheet for each different spider. This would require you to implement this logic yourself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM