![](/img/trans.png)
[英]How to create a csv file dynamically with name of the spider in scrapy python
[英]Scrapy : Create csv file with spider name
我目前正在尋找將報廢的數據導出到文件中,這些文件的名稱基於蜘蛛名稱。
這是我的pipelines.py:
from mydatacrowd.models import Datacrowd
from scrapy.contrib.exporter import CsvItemExporter
class CsvExportPipeline(object):
def _init_(self):
self.files = {}
@classmethod
def from_crawlers(cls, crawler):
pipeline = cls()
crawler.signal.connect(pipeline.spider_opened, signal.spider_opened)
crawler.signal.connect(pipeline.spider_closed, signal.spider_closed)
return pipeline
def spider_opened(self, spider):
print 'Hello world!'
print spider.name
file = open('%s.csv' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = CsvItemExporter(file)
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
def process_item(self, item, spider):
item.save()
return item
這是我的settings.py的一部分:
...
ITEM_PIPELINES = {
'datacrowdscrapy.pipelines.CsvExportPipeline': 1000,
}
FEED_FORMAT = 'csv'
FEED_EXPORTERS = {
'csv': 'datacrowdscrapy.feedexport.CsvScrapperExporter'
}
...
這是我的feedexport.py:
from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter
class CsvScrapperExporter(CsvItemExporter):
def _init_(self, *args, **kwargs):
kwargs['fields_to_export'] = settings.getlist('EXPORT_FIELDS') or None
kwargs['encoding'] = settings.get('EXPORT_ENCODING', 'utf-8')
super(CsvScrapperExporter, self).__init__(*args, **kwargs)
沒有創建文件,沒有錯誤顯示,並且“ Hello world”從不出現在日志中,我還缺少什么?
謝謝 !
編輯:
我的settings.py中沒有FEED_URI參數,這有幫助嗎?
查看scrapy爬行命令源,如果您為scrapy提供如下輸出選項,則scrapy只會讀取FEED_EXPORTERS設置:
scrapy crawl <spider_name> -o csv
從scrapy / commands / crawl.py:
if opts.output:
...
valid_output_formats = self.settings['FEED_EXPORTERS'].keys() +
self.settings['FEED_EXPORTERS_BASE'].keys()
....
self.settings.overrides['FEED_FORMAT'] = opts.output_format
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.