简体   繁体   English

如何在一次查询中根据条件更新/插入多个文档 MongoDB

[英]How to update / insert multiple documents based on condition in one query MongoDB

I have the following structure for 1 document:我有 1 个文档的以下结构:

{
    "uid": uid,
    "at": at,
    "url": url,
    "members": members
}

I would like to bulk create / or update the documents if they already exist based on their uid as i am passing many object like the above in one request.如果文档已经存在,我想批量创建/或更新文档,因为我在一个请求中像上面那样传递了许多uid

I want something that looks like (in python):我想要一些看起来像(在 python 中)的东西:

ids = []
for item in items:
  ids.append(item["uid"])
db.collection.update_many({"_id": {"$id": ids}}, {"$set": items}, upsert=True)

But will it update the correct documents according to the correct ids and insert the ones where uid doesn't exist?但是它会根据正确的ids更新正确的文档并插入uid不存在的文档吗?

I would like to perform the insert OR update if the object exists or not it will either insert it or replace it based on the uid .如果object存在或不存在,我想执行insertupdate ,它将根据uid插入或替换它。

Thanks in advance提前致谢

There is no direct way to upsert a list of keys with a list of contents, eg (this is pseudocode, not valid MQL):没有直接的方法来使用内容列表upsert键列表,例如(这是伪代码,不是有效的 MQL):

keys = ['A','B'];
content = [{key:'A',name:'X',pos:1},{key:'B',name:'Y',pos:2}];
db.collection.updateMany(keys, {$set: content});

The updateMany function is designed to filter out a relatively small number of docs and apply a (mostly) constant update pattern, which is NOT the desired operation here. updateMany function 旨在过滤掉相对较少数量的文档并应用(大部分)持续更新模式,这不是此处所需的操作。

It is possible to easily do this in a two step way:可以通过两步方式轻松地做到这一点:

  1. Load your updates and potential inserts into a temp collection.将您的更新和可能的插入加载到临时集合中。
  2. Use the $merge operator to overlay the material onto the target collection.使用$merge运算符将材料叠加到目标集合上。

Consider this existing target collection foo :考虑这个现有的目标集合foo

db.foo.insert([
    {uid: 0, at: "A", members: [1,2]},
    {uid: 1, at: "B", members: [1,2]},
    {uid: 2, at: "C", members: [1,2]},
    {uid: 3, at: "D", members: [1,2]}
]);

Assume we construct an array of items similar to the what the OP posted:假设我们构建了一个类似于 OP 发布的项目数组:

var items = [
    {uid:1, at:"XX"},   // change
    {uid:3, at:"XX", members:[3,4]},  // change
    {uid:7, at:"Q", members:[3,4]},   // new
    {uid:8, at:"P", members:[3,4]}    // new
];

The following pipeline will merge these onto the target collection:以下管道将这些合并到目标集合中:


db.TEMP1.drop(); // OK if does not exist

// Blast items into the TEMP1 collection.  You could use the bulk
// insert functions https://docs.mongodb.com/manual/reference/method/Bulk.insert/ if you prefer,
// or in theory even use mongoimport outside of this script to bulk
// load material into TEMP1.
db.TEMP1.insert(items); 

c=db.TEMP1.aggregate([
    {$project: {_id:false}}, // must exclude _id to prevent conflict with target 'foo'
    {$merge: {
        into: "foo",  // ah HA!  
        on: [ "uid" ],
        whenMatched: "replace",
        whenNotMatched: "insert"
    }}
]);

db.foo.find();
{ "_id" : 0, "uid" : 0, "at" : "A", "members" : [ 1, 2 ] }  // unchanged
{ "_id" : 1, "uid" : 1, "at" : "XX" }     // modified and members gone
{ "_id" : 2, "uid" : 2, "at" : "C", "members" : [ 1, 2 ] }  // unchanged
{ "_id" : 3, "uid" : 3, "at" : "XX", "members" : [ 3, 4 ] }  // modified
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe3"), "uid" : 7, "at" : "Q", "members" : [ 3, 4 ] }   // inserted
{ "_id" : ObjectId("6228be517dec6ed6fa40cbe4"), "uid" : 8, "at" : "P", "members" : [ 3, 4 ] }   // inserted

The caveat is a unique index must exist on uid in the target collection but that is probably what you would want to have anyway for such an operation to be reasonably performant.需要注意的是,唯一索引必须存在于目标集合中的uid上,但这可能是您无论如何想要让此类操作合理执行的索引。

This approach is more client-side data oriented, ie it assumes the client side is building up big, complete structures of data to either overlay or insert, as opposed to update which is more "surgical" in its ability to selectively touch as few fields as possible.这种方法更面向客户端数据,即它假设客户端正在构建大而完整的数据结构以覆盖或插入,而不是update ,后者在选择性接触少数字段的能力方面更“外科手术”尽可能。

For quantitative comparison, 2 million docs sized roughly as above were loaded into foo .为了进行定量比较,将 200 万个大小与上述大致相同的文档加载到foo中。 A script that uses bulk operations loaded 50,000 items into TEMP1 of which half had matching uid (for update) and half did not (for insert), followed by the running of the $merge script.使用批量操作的脚本将 50,000 个项目加载到TEMP1中,其中一半具有匹配的uid (用于更新),另一半没有(用于插入),然后运行$merge脚本。 On a MacBookPro w/16GB RAM and SSD disk and a v4.4.2 server, the times are在配备 16GB RAM 和 SSD 磁盘以及 v4.4.2 服务器的 MacBookPro 上,时间为

1454 millis to insert 50013; 34396 ins/sec
7144 millis to merge; 7000 upd/sec
8598 total millis; 5816 ops/sec

If done in a simple loop:如果在一个简单的循环中完成:

33452 millis for loop update; 1495 upd/sec

So the temp collection and $merge approach is nearly 4X faster.所以临时收集和$merge方法快了近 4 倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM