简体   繁体   中英

AttributeError: 'int' object has no attribute 'rindex'

Setup

I'm using Scrapy 0.24.4 and Scrapy-ElasticSearch 0.5 to scrape a website and store the results in an elasticsearch instance I have running.

I've used this blog post to set it all up, with the minor modification that I documented here .

settings.py

BOT_NAME = 'blah'

SPIDER_MODULES = ['blah.spiders']
NEWSPIDER_MODULE = 'blah.spiders'

ITEM_PIPELINES = [
  'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline', 100
]

ELASTICSEARCH_SERVER = 'localhost' 
ELASTICSEARCH_PORT = 9200 
ELASTICSEARCH_INDEX = 'scrapy'
ELASTICSEARCH_TYPE = 'items'

Problem

If I run the following command to scrape a website:

scrapy crawl wiki -o wiki.json

With ITEM_PIPELINES commented out - then it works correctly and exports all results to a wiki.json file.

With ITEM_PIPELINES uncommented (eg set to enable piping results to elasticsearch) - I get the following error:

File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 34, in load_object
   dot = path.rindex('.')
AttributeError: 'int' object has no attribute 'rindex'

Notes

  • May or may not be relevant. I actually had to change my local copy of ElasticSearchPipeline python file to comment out this block which was causing syntax errors at the point at which it was indexing using uniq_id.

Any help hugely appreciated.

Stupid, stupid stupid. It was a syntax error!

Having an ITEM_PIPELINES list is deprecated, so it needs to be a dictionary, but my attempt at converting to a dictionary was terribly mangled:

ITEM_PIPELINES = [
  'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline', 100
]

That's not valid syntax. It should have been:

ITEM_PIPELINES = {
  'scrapyelasticsearch.scrapyelasticsearch.ElasticSearchPipeline': 100
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM