Setup
I'm using Scrapy 0.24.4 and Scrapy-ElasticSearch 0.5 to scrape a website and store the results in an elasticsearch instance I have running.
I've used this blog post to set it all up, with the minor modification that I documented here .
settings.py
BOT_NAME = 'blah'
SPIDER_MODULES = ['blah.spiders']
NEWSPIDER_MODULE = 'blah.spiders'
ITEM_PIPELINES = [
'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline', 100
]
ELASTICSEARCH_SERVER = 'localhost'
ELASTICSEARCH_PORT = 9200
ELASTICSEARCH_INDEX = 'scrapy'
ELASTICSEARCH_TYPE = 'items'
Problem
If I run the following command to scrape a website:
scrapy crawl wiki -o wiki.json
With ITEM_PIPELINES commented out - then it works correctly and exports all results to a wiki.json file.
With ITEM_PIPELINES uncommented (eg set to enable piping results to elasticsearch) - I get the following error:
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 34, in load_object
dot = path.rindex('.')
AttributeError: 'int' object has no attribute 'rindex'
Notes
Any help hugely appreciated.
Stupid, stupid stupid. It was a syntax error!
Having an ITEM_PIPELINES list is deprecated, so it needs to be a dictionary, but my attempt at converting to a dictionary was terribly mangled:
ITEM_PIPELINES = [
'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline', 100
]
That's not valid syntax. It should have been:
ITEM_PIPELINES = {
'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline': 100
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.