I was wondering if there is a way to replace old data with new data where the items are the same with Scrapy.
For example, say I had scraped stock data from yahoo finance for a particular stock. Then, later on after new data has been released, I want to update that stocks data in the same output.csv
file I had used before.
I'm somewhat surprised that this isn't something Scrapy does already with it's command line (or they do I am just blind and can't find it).
I was thinking of maybe configuring pipelines.py
to do the trick:
# pipelines.py:
class stockPipeline(object):
def update_item(self, item, spider):
with open('output.csv', 'rt') as f:
reader = csv.DictReader(f)
for stock in reader:
if stock['name'] == item['name']:
# Somehow get scrapy to overwrite this particular row...
# Or, maybe get DictWriter to do it for us instead of scrapy??
Have the data pipelined into a real database and your problem would be solved right away.
For instance, let's say you'll switch to MySQL
- in this case the problem would simply come down to making an insert if a record does not exist and making and update otherwise.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.