[英]How to create index in mongodb with pymongo
I use scrapy crawl data and save it to mongodb, i want to save 2dsphere
index in mongodb.我使用scrapy抓取数据并将其保存到mongodb,我想在mongodb中保存
2dsphere
索引。
Here is my pipelines.py file with scrapy这是我的带有scrapy的pipelines.py文件
from pymongo import MongoClient
from scrapy.conf import settings
class MongoDBPipeline(object):
global theaters
theaters = []
def __init__(self):
connection = MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT'])
self.db = connection[settings['MONGODB_DB']]
self.collection = self.db[settings['MONGODB_COLLECTION']]
def open_spider(self, spider):
print 'Pipelines => open_spider =>'
def process_item(self, item, spider):
global theaters
# get the class item name to be collection name
self.collection = self.db[type(item).__name__.replace('_Item','')]
if item['theater'] not in theaters:
print 'remove=>',item['theater']
theaters.append(item['theater'])
self.collection.remove({'theater': item['theater']})
# insert the collection name that is from class object item
self.collection.insert(dict(item))
# Here is what i try to create 2dsphere index
self.collection.create_index({"location": "2dsphere"})
return item
When i use self.collection.create_index({"location": "2dsphere"})
当我使用
self.collection.create_index({"location": "2dsphere"})
It shows error TypeError: if no direction is specified, key_or_list must be an instance of list
它显示错误类型错误
TypeError: if no direction is specified, key_or_list must be an instance of list
If i try如果我尝试
self.collection.create_index([('location', "2dsphere")], name='search_index', default_language='english')
There is no error any more , but my mongodb still hasn't any index under location
.没有错误了,但是我的 mongodb 在
location
下仍然没有任何索引。
I think i obey the GeoJson format.我想我遵守 GeoJson 格式。
Is any way to save 2dsphere
index in mongodb when i using scrapy
?当我使用
scrapy
时,有什么方法可以在mongodb中保存2dsphere
索引? Or should i just save the data like the photo structure and save index by another server file (like nodejs
)或者我应该只保存照片结构之类的数据并通过另一个服务器文件(如
nodejs
)保存索引
Any help would be appreciated.任何帮助,将不胜感激。 Thanks in advance.
提前致谢。
According to Adam Harrison
respond, i try to change my mongodb name location
to geometry
根据
Adam Harrison
回应,我尝试将我的 mongodb 名称location
更改为geometry
Than add code import pymongo
in my pipelines.py file比在我的 pipelines.py 文件中添加代码
import pymongo
and use self.collection.create_index([("geometry", pymongo.GEOSPHERE)])
并使用
self.collection.create_index([("geometry", pymongo.GEOSPHERE)])
There is no any error but still can't find the index under geometry
没有任何错误,但仍然找不到
geometry
下的索引
For me it was necessary to use the ItemAdapter to convert the Item parameter into a list.对我来说,有必要使用 ItemAdapter 将 Item 参数转换为列表。 So I was able to query the database.
所以我能够查询数据库。
from itemadapter import ItemAdapter, adapter
import pymongo
from scrapy.exceptions import DropItem
collection_name = 'myCollection'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
)
def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
self.client.close()
The process_item function: process_item 函数:
def process_item(self, item, spider):
adapter = ItemAdapter(item)
if self.db[self.collection_name].find_one({'id':adapter['id']}) != None:
dado = self.db[self.collection_name].find_one_and_update({'id':adapter['id']})
## ----> raise DropItem(f"Duplicate item found: {item!r}") <------
print(f"Duplicate item found: {dado!r}")
else:
self.db[self.collection_name].insert_one(ItemAdapter(item).asdict())
return item
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.