[英]MongoDB optimize multiple find_one + insert inside loop
I'm using MongoDB 4.0.1 and Pymongo with pyhton 3.5. 我正在将MongoDB 4.0.1和Pymongo与pyhton 3.5一起使用。 I have to loop over 12000 items every 30 - 60 seconds and add new data into MongoDB.
我必须每30-60秒循环遍历12000个项目,并将新数据添加到MongoDB中。 For this example we will talk about User, Pet and Car.
对于此示例,我们将讨论用户,宠物和汽车。 The User can get 1 Car and 1 Pet.
用户可以获得1辆汽车和1只宠物。
I need the pet ObjectID and the car ObjectID to create my User so I have to add them one by one in the loop and this is very slow. 我需要pet ObjectID和car ObjectID来创建我的用户,因此我必须在循环中一个接一个地添加它们,这非常慢。 It takes ~25 seconds to find existing data and add them if the data not exist.
查找现有数据大约需要25秒钟,如果不存在则添加它们。
while dictionary != False:
# Create pet if not exist
existing_pet = pet.find_one({"code": dictionary['pet_code']})
if bool(existing_pet):
pet_id = existing_pet['_id']
else:
pet_id = pet.insert({
"code" : dictionary['pet_code'],
"name" : dictionary['name']
})
# Call web service to create pet remote
# Create car if not exist
existing_car = car.find_one({"platenumber": dictionary['platenumber']})
if bool(existing_car):
car_id = existing_car['_id']
else:
car_id = car.insert({
"platenumber" : dictionary['platenumber'],
"model" : dictionary['model'],
"energy" : 'electric'
})
# Call web service to create car remote
# Create user if not exist
existing_user = user.find_one(
{"$and": [
{"user_code": dictionary['user_code']},
{"car": car_id},
{"pet": pet_id}
]}
)
if not bool(existing_user):
user_data.append({
"pet" : pet_id,
"car" : car_id,
"firstname" : dictionary['firstname'],
"lastname" : dictionary['lastname']
})
# Call web service to create user remote
# Bulk insert user
if user_data:
user.insert_many(user_data)
I created indexes for each column used for the find_one : 我为用于find_one的每一列创建了索引:
db.user.createIndex( { user_code: 1 } )
db.user.createIndex( { pet: 1 } )
db.user.createIndex( { car: 1 } )
db.pet.createIndex( { pet_code: 1 }, { unique: true } )
db.car.createIndex( { platenumber: 1 }, { unique: true } )
There is a way to speed up this loop ? 有没有办法加快这个循环? There is something with aggregation or other things to help me ?
有什么聚集的东西或其他东西可以帮助我? Or maybe another way to do what I want ?
或者也许是我想要做的另一种方式?
I'm open for all advices. 我愿意接受所有建议。
Don´t do 12000 find_one queries, do 1 query to bring all that exist with $in operator. 不要执行12000 find_one查询,而是执行1个查询以使用$ in运算符将所有存在的查询带入。 Code would be something like:
代码类似于:
pet_codes = []
pet_names = []
while dictionary != False:
pet_codes.append(dictionary['pet_code'])
pet_names.append(dictionary['pet_name'])
pets = dict()
for pet in pet.find({"code": {$in: pet_codes}}):
pets[pet['code']] = pet
new_pets = []
for code, name in zip(pet_codes, pet_names):
if code not in pets:
new_pets.add({'pet_code': code, 'name': name})
pet.insert_many(new_pets)
As you already have an index on pet_code making it unique, we can do better: just try to insert them all, because if we try to insert an existing one that record will get an error, but the rest will succeed by using the ordered=False from the docs : 由于您已经在pet_code上建立了一个索引使其成为唯一索引,因此我们可以做得更好:只需尝试将它们全部插入,因为如果我们尝试插入一个现有的索引,则该记录将出现错误,但是其余的将通过使用ordered =来成功来自文档的错误:
new_pets = []
while dictionary != False:
new_pets.add({
"code" : dictionary['pet_code'],
"name" : dictionary['name']
})
pet.insert_many(new_pets, ordered=False)
In the case where you do not have a unique restriction set, another method is batching the operations 如果您没有唯一的限制集,则另一种方法是批量操作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.