[英]How to update / insert multiple documents based on condition in one query MongoDB
I have the following structure for 1 document:我有 1 个文档的以下结构:
{
"uid": uid,
"at": at,
"url": url,
"members": members
}
I would like to bulk create / or update the documents if they already exist based on their uid
as i am passing many object like the above in one request.如果文档已经存在,我想批量创建/或更新文档,因为我在一个请求中像上面那样传递了许多uid
。
I want something that looks like (in python):我想要一些看起来像(在 python 中)的东西:
ids = []
for item in items:
ids.append(item["uid"])
db.collection.update_many({"_id": {"$id": ids}}, {"$set": items}, upsert=True)
But will it update the correct documents according to the correct ids
and insert the ones where uid
doesn't exist?但是它会根据正确的ids
更新正确的文档并插入uid
不存在的文档吗?
I would like to perform the insert
OR update
if the object
exists or not it will either insert it or replace it based on the uid
.如果object
存在或不存在,我想执行insert
或update
,它将根据uid
插入或替换它。
Thanks in advance提前致谢
There is no direct way to upsert
a list of keys with a list of contents, eg (this is pseudocode, not valid MQL):没有直接的方法来使用内容列表upsert
键列表,例如(这是伪代码,不是有效的 MQL):
keys = ['A','B'];
content = [{key:'A',name:'X',pos:1},{key:'B',name:'Y',pos:2}];
db.collection.updateMany(keys, {$set: content});
The updateMany
function is designed to filter out a relatively small number of docs and apply a (mostly) constant update pattern, which is NOT the desired operation here. updateMany
function 旨在过滤掉相对较少数量的文档并应用(大部分)持续更新模式,这不是此处所需的操作。
It is possible to easily do this in a two step way:可以通过两步方式轻松地做到这一点:
$merge
operator to overlay the material onto the target collection.使用$merge
运算符将材料叠加到目标集合上。 Consider this existing target collection foo
:考虑这个现有的目标集合foo
:
db.foo.insert([
{uid: 0, at: "A", members: [1,2]},
{uid: 1, at: "B", members: [1,2]},
{uid: 2, at: "C", members: [1,2]},
{uid: 3, at: "D", members: [1,2]}
]);
Assume we construct an array of items similar to the what the OP posted:假设我们构建了一个类似于 OP 发布的项目数组:
var items = [
{uid:1, at:"XX"}, // change
{uid:3, at:"XX", members:[3,4]}, // change
{uid:7, at:"Q", members:[3,4]}, // new
{uid:8, at:"P", members:[3,4]} // new
];
The following pipeline will merge these onto the target collection:以下管道将这些合并到目标集合中:
db.TEMP1.drop(); // OK if does not exist
// Blast items into the TEMP1 collection. You could use the bulk
// insert functions https://docs.mongodb.com/manual/reference/method/Bulk.insert/ if you prefer,
// or in theory even use mongoimport outside of this script to bulk
// load material into TEMP1.
db.TEMP1.insert(items);
c=db.TEMP1.aggregate([
{$project: {_id:false}}, // must exclude _id to prevent conflict with target 'foo'
{$merge: {
into: "foo", // ah HA!
on: [ "uid" ],
whenMatched: "replace",
whenNotMatched: "insert"
}}
]);
db.foo.find();
{ "_id" : 0, "uid" : 0, "at" : "A", "members" : [ 1, 2 ] } // unchanged
{ "_id" : 1, "uid" : 1, "at" : "XX" } // modified and members gone
{ "_id" : 2, "uid" : 2, "at" : "C", "members" : [ 1, 2 ] } // unchanged
{ "_id" : 3, "uid" : 3, "at" : "XX", "members" : [ 3, 4 ] } // modified
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe3"), "uid" : 7, "at" : "Q", "members" : [ 3, 4 ] } // inserted
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe4"), "uid" : 8, "at" : "P", "members" : [ 3, 4 ] } // inserted
The caveat is a unique index must exist on uid
in the target collection but that is probably what you would want to have anyway for such an operation to be reasonably performant.需要注意的是,唯一索引必须存在于目标集合中的uid
上,但这可能是您无论如何想要让此类操作合理执行的索引。
This approach is more client-side data oriented, ie it assumes the client side is building up big, complete structures of data to either overlay or insert, as opposed to update
which is more "surgical" in its ability to selectively touch as few fields as possible.这种方法更面向客户端数据,即它假设客户端正在构建大而完整的数据结构以覆盖或插入,而不是update
,后者在选择性接触少数字段的能力方面更“外科手术”尽可能。
For quantitative comparison, 2 million docs sized roughly as above were loaded into foo
.为了进行定量比较,将 200 万个大小与上述大致相同的文档加载到foo
中。 A script that uses bulk operations loaded 50,000 items into TEMP1
of which half had matching uid
(for update) and half did not (for insert), followed by the running of the $merge
script.使用批量操作的脚本将 50,000 个项目加载到TEMP1
中,其中一半具有匹配的uid
(用于更新),另一半没有(用于插入),然后运行$merge
脚本。 On a MacBookPro w/16GB RAM and SSD disk and a v4.4.2 server, the times are在配备 16GB RAM 和 SSD 磁盘以及 v4.4.2 服务器的 MacBookPro 上,时间为
1454 millis to insert 50013; 34396 ins/sec
7144 millis to merge; 7000 upd/sec
8598 total millis; 5816 ops/sec
If done in a simple loop:如果在一个简单的循环中完成:
33452 millis for loop update; 1495 upd/sec
So the temp collection and $merge
approach is nearly 4X faster.所以临时收集和$merge
方法快了近 4 倍。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.