简体   繁体   中英

Scrapy (Python): Overwriting pre-existing data with same item name

I was wondering if there is a way to replace old data with new data where the items are the same with Scrapy.

For example, say I had scraped stock data from yahoo finance for a particular stock. Then, later on after new data has been released, I want to update that stocks data in the same output.csv file I had used before.

I'm somewhat surprised that this isn't something Scrapy does already with it's command line (or they do I am just blind and can't find it).

I was thinking of maybe configuring pipelines.py to do the trick:

# pipelines.py:

class stockPipeline(object):
    def update_item(self, item, spider):
        with open('output.csv', 'rt') as f:
            reader = csv.DictReader(f)
            for stock in reader:
                if stock['name'] == item['name']:
                    # Somehow get scrapy to overwrite this particular row...
                    # Or, maybe get DictWriter to do it for us instead of scrapy??

Have the data pipelined into a real database and your problem would be solved right away.

For instance, let's say you'll switch to MySQL - in this case the problem would simply come down to making an insert if a record does not exist and making and update otherwise.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM