繁体   English   中英

Scrapy - pymongo 没有将项目插入数据库

[英]Scrapy - pymongo not inserting items to DB

所以我在玩scrapy试图学习,并使用MongoDB作为我的数据库我走到了死胡同。 基本上,抓取工作是因为我正在获取的项目显示在终端日志中,但我无法将数据发布到我的数据库上。 MONGO_URI 是正确的,因为我在 python shell 中尝试过,我可以在其中创建和存储数据..

这是我的文件

项目.py


import scrapy

class MaterialsItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title = scrapy.Field()
    price = scrapy.Field()
   ## url = scrapy.Field()
    pass

蜘蛛.py

import scrapy
from scrapy.selector import Selector

from ..items import MaterialsItem

class mySpider(scrapy.Spider):
    name = "<placeholder for post>"
    allowed_domains = ["..."]
    start_urls = [
   ...
    ]

    def parse(self, response):
        products = Selector(response).xpath('//div[@class="content"]')

        for product in products:        
                item = MaterialsItem()
                item['title'] = product.xpath("//a[@class='product-card__title product-card__title-v2']/text()").extract(),
                item['price'] = product.xpath("//div[@class='product-card__price-value ']/text()").extract()
               ## product['url'] = 
                yield item

设置.py

MONGO_PIPELINES = {
    'materials.pipelines.MongoPipeline': 300,
}


#setup mongo DB
MONGO_URI = "my MongoDB Atlas address"
MONGO_DB = "materials"

管道.py

import pymongo

class MongoPipeline(object):

    collection_name = 'my-prices'

    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db

    @classmethod
    def from_crawler(cls, crawler):
        ## pull in information from settings.py
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DB', ', <placeholder-spider name>')

        )

    def open_spider(self, spider):
        ## initializing spider
        ## opening db connection
        self.client = pymongo.MongoClient(self.mongo_uri)
        self.db = self.client[self.mongo_db]

    def close_spider(self, spider):
        ## clean up when spider is closed
        self.client.close()

    def process_item(self, item, spider):
        ## how to handle each post
        self.db[self.collection_name].insert(dict(item))
        logging.debug("Post added to MongoDB")
        return item

任何帮助都会很棒!

**编辑

文件结构

materials
  spiders
  my-spider
items.py
pipelines.py
settings.py

MongoPipeline 类中的行不应该是:

collection_name = 'my-prices'

是:

self.collection_name = 'my-prices'

因为你打电话:

self.db[self.collection_name].insert(dict(item))

我想通了,我重新审视了一切。 结果在设置中我不得不编辑

MONGO_PIPELINES = {
    'materials.pipelines.MongoPipeline': 300,
}

ITEM_PIPELINES = {
    'materials.pipelines.MongoPipeline': 300,
}

我想我不应该改变命名格式,从 ITEM_PIPELINES 到 MONGO_PIPELINES。

代码错误是什么,我想如果可能的话,我需要在init 下进行,您可以将其上传到 git 吗? 我可能会尝试看看

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM