Scrapy (Python): Overwriting pre-existing data with same item name

Question

I was wondering if there is a way to replace old data with new data where the items are the same with Scrapy.

For example, say I had scraped stock data from yahoo finance for a particular stock. Then, later on after new data has been released, I want to update that stocks data in the same output.csv file I had used before.

I'm somewhat surprised that this isn't something Scrapy does already with it's command line (or they do I am just blind and can't find it).

I was thinking of maybe configuring pipelines.py to do the trick:

# pipelines.py:

class stockPipeline(object):
    def update_item(self, item, spider):
        with open('output.csv', 'rt') as f:
            reader = csv.DictReader(f)
            for stock in reader:
                if stock['name'] == item['name']:
                    # Somehow get scrapy to overwrite this particular row...
                    # Or, maybe get DictWriter to do it for us instead of scrapy??

Answer 1

Have the data pipelined into a real database and your problem would be solved right away.

For instance, let's say you'll switch to MySQL - in this case the problem would simply come down to making an insert if a record does not exist and making and update otherwise.

Scrapy (Python): Overwriting pre-existing data with same item name

Question

1 answers

solution1
0 2015-09-07 01:36:22

Scrapy (Python): Overwriting pre-existing data with same item name

Question

1 answers

solution1 0 2015-09-07 01:36:22

solution1
0 2015-09-07 01:36:22