简体   繁体   中英

Storing the scraped data in MongoDB

I want to store the scraped data in MongoDb, but I am getting an error.

File "C:\\Pythom27\\lib\\site-packages\\six.py", line 599 , in iteritems return d.iteritems(**kw) AttributeError: 'list' object has no attribute 'iteritem'. I have not used attribute has iteritem anywhere in the program Here is the program code: ex.py

import scrapy
from example.items import ExampleItem


class ExampleSpider(scrapy.Spider):
    name = 'aaa'
    allowed_domains = ["in.bookmyshow.com"]
    start_urls = ["https://in.bookmyshow.com/movies"]

  def parse(self, response):
    links = response.xpath('//a/@href').re('movies/[^\/]+\/.*$')
    for url in set(links):
        url = response.urljoin(url)
        yield scrapy.Request(url, callback=self.parse_movie)

  def parse_movie(self, response):
    item = {}
    item['Moviename'] = map(unicode.strip, response.xpath('.//h1[@id="eventTitle"]/text()').extract())
    item['Language'] = map(unicode.strip, response.xpath('/html/body/div[1]/div[2]/div[1]/div[2]/div[1]/div[3]/span[1]/a/text()').extract())
    item['Info'] = map(unicode.strip, response.xpath('/html/body/div[1]/div[2]/div[1]/div[2]/div[1]/div[3]/span[3]/a/text()').extract())
    yield item

settings.py:

 BOT_NAME = 'example'

 SPIDER_MODULES = ['example.spiders']
 NEWSPIDER_MODULE = 'example.spiders'

 ITEM_PIPELINES = ['example.pipelines.MongoDBPipeline', ]

 MONGODB_SERVER = "localhost"
 MONGODB_PORT = 27017
 MONGODB_DB = "ticketbook"
 MONGODB_COLLECTION = "movies"

pipleline.py

import pymongo

from scrapy.conf import settings
from scrapy.exceptions import DropItem
from scrapy import log

class ExamplePipeline(object):
    def __init__(self):
    connection = pymongo.Connection(settings['MONGODB_HOST'], settings['MONGODB_PORT'])
    db = connection[settings['MONGODB_DATABASE']]
    self.collection = db[settings['MONGODB_COLLECTION']]

    def process_item(self, item, spider):
    self.collection.insert(dict(item))
    log.msg("Item wrote to MongoDB database {}, collection {}, at host {}, port {}".format(
        settings['MONGODB_DATABASE'],
        settings['MONGODB_COLLECTION'],
        settings['MONGODB_HOST'],
        settings['MONGODB_PORT']))
    return item

I would like to know where i have gone wrong..

In your settings.py, change the ITEMS_PIPELINES from a list to a dictionary like so:

ITEM_PIPELINES = { 'example.pipelines.MongoDBPipeline': 100 }

See explanation: http://doc.scrapy.org/en/latest/topics/item-pipeline.html#activating-an-item-pipeline-component

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM