简体   繁体   English

如何使用 pymongo 在 mongodb 中创建索引

[英]How to create index in mongodb with pymongo

I use scrapy crawl data and save it to mongodb, i want to save 2dsphere index in mongodb.我使用scrapy抓取数据并将其保存到mongodb,我想在mongodb中保存2dsphere索引。

Here is my pipelines.py file with scrapy这是我的带有scrapy的pipelines.py文件

from pymongo import MongoClient
from scrapy.conf import settings

class MongoDBPipeline(object):

    global theaters
    theaters = []

    def __init__(self):
        connection = MongoClient(
            settings['MONGODB_SERVER'],
            settings['MONGODB_PORT'])
        self.db = connection[settings['MONGODB_DB']]
        self.collection = self.db[settings['MONGODB_COLLECTION']]

    def open_spider(self, spider):
        print 'Pipelines => open_spider =>'

    def process_item(self, item, spider):

        global theaters
        # get the class item name to be collection name
        self.collection = self.db[type(item).__name__.replace('_Item','')]

        if  item['theater'] not in theaters:
            print 'remove=>',item['theater']
            theaters.append(item['theater'])
            self.collection.remove({'theater': item['theater']})

        # insert the collection name that is from class object item
        self.collection.insert(dict(item))
        # Here is what i try to create 2dsphere index
        self.collection.create_index({"location": "2dsphere"})

        return item

When i use self.collection.create_index({"location": "2dsphere"})当我使用self.collection.create_index({"location": "2dsphere"})

It shows error TypeError: if no direction is specified, key_or_list must be an instance of list它显示错误类型错误TypeError: if no direction is specified, key_or_list must be an instance of list

If i try如果我尝试

self.collection.create_index([('location', "2dsphere")], name='search_index', default_language='english')

There is no error any more , but my mongodb still hasn't any index under location .没有错误了,但是我的 mongodb 在location下仍然没有任何索引。 在此处输入图片说明

I think i obey the GeoJson format.我想我遵守 GeoJson 格式。

Is any way to save 2dsphere index in mongodb when i using scrapy ?当我使用scrapy时,有什么方法可以在mongodb中保存2dsphere索引? Or should i just save the data like the photo structure and save index by another server file (like nodejs )或者我应该只保存照片结构之类的数据并通过另一个服务器文件(如nodejs )保存索引

Any help would be appreciated.任何帮助,将不胜感激。 Thanks in advance.提前致谢。

According to Adam Harrison respond, i try to change my mongodb name location to geometry根据Adam Harrison回应,我尝试将我的 mongodb 名称location更改为geometry

Than add code import pymongo in my pipelines.py file比在我的 pipelines.py 文件中添加代码import pymongo

and use self.collection.create_index([("geometry", pymongo.GEOSPHERE)])并使用self.collection.create_index([("geometry", pymongo.GEOSPHERE)])

There is no any error but still can't find the index under geometry没有任何错误,但仍然找不到geometry下的索引在此处输入图片说明

For me it was necessary to use the ItemAdapter to convert the Item parameter into a list.对我来说,有必要使用 ItemAdapter 将 Item 参数转换为列表。 So I was able to query the database.所以我能够查询数据库。

from itemadapter import ItemAdapter, adapter
import pymongo
from scrapy.exceptions import DropItem

collection_name = 'myCollection'
    
    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db
    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
        )

def open_spider(self, spider):
    self.client = pymongo.MongoClient(self.mongo_uri)
    self.db = self.client[self.mongo_db]

def close_spider(self, spider):
    self.client.close()

The process_item function: process_item 函数:

def process_item(self, item, spider):
    adapter = ItemAdapter(item)
    if self.db[self.collection_name].find_one({'id':adapter['id']}) != None:
        dado = self.db[self.collection_name].find_one_and_update({'id':adapter['id']})
        ## ----> raise DropItem(f"Duplicate item found: {item!r}") <------
        print(f"Duplicate item found: {dado!r}")
    else:
        self.db[self.collection_name].insert_one(ItemAdapter(item).asdict())
    return item

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM