[英]Scrapy - pymongo not inserting items to DB
所以我在玩scrapy试图学习,并使用MongoDB作为我的数据库我走到了死胡同。 基本上,抓取工作是因为我正在获取的项目显示在终端日志中,但我无法将数据发布到我的数据库上。 MONGO_URI 是正确的,因为我在 python shell 中尝试过,我可以在其中创建和存储数据..
这是我的文件
项目.py
import scrapy
class MaterialsItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
price = scrapy.Field()
## url = scrapy.Field()
pass
蜘蛛.py
import scrapy
from scrapy.selector import Selector
from ..items import MaterialsItem
class mySpider(scrapy.Spider):
name = "<placeholder for post>"
allowed_domains = ["..."]
start_urls = [
...
]
def parse(self, response):
products = Selector(response).xpath('//div[@class="content"]')
for product in products:
item = MaterialsItem()
item['title'] = product.xpath("//a[@class='product-card__title product-card__title-v2']/text()").extract(),
item['price'] = product.xpath("//div[@class='product-card__price-value ']/text()").extract()
## product['url'] =
yield item
设置.py
MONGO_PIPELINES = {
'materials.pipelines.MongoPipeline': 300,
}
#setup mongo DB
MONGO_URI = "my MongoDB Atlas address"
MONGO_DB = "materials"
管道.py
import pymongo
class MongoPipeline(object):
collection_name = 'my-prices'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
## pull in information from settings.py
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DB', ', <placeholder-spider name>')
)
def open_spider(self, spider):
## initializing spider
## opening db connection
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
## clean up when spider is closed
self.client.close()
def process_item(self, item, spider):
## how to handle each post
self.db[self.collection_name].insert(dict(item))
logging.debug("Post added to MongoDB")
return item
任何帮助都会很棒!
**编辑
文件结构
materials
spiders
my-spider
items.py
pipelines.py
settings.py
MongoPipeline 类中的行不应该是:
collection_name = 'my-prices'
是:
self.collection_name = 'my-prices'
因为你打电话:
self.db[self.collection_name].insert(dict(item))
我想通了,我重新审视了一切。 结果在设置中我不得不编辑
MONGO_PIPELINES = {
'materials.pipelines.MongoPipeline': 300,
}
到
ITEM_PIPELINES = {
'materials.pipelines.MongoPipeline': 300,
}
我想我不应该改变命名格式,从 ITEM_PIPELINES 到 MONGO_PIPELINES。
代码错误是什么,我想如果可能的话,我需要在init 下进行,您可以将其上传到 git 吗? 我可能会尝试看看
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.