简体   繁体   English

Mongodb-从集合中删除不具有唯一值的文档

[英]Mongodb - delete docs from collection that do not have unique value

I have collection of objects such as this: 我有这样的对象的集合:

{"_id":"...", "user":"foo", "value":"a"}, // this one stays coz its user is foo
{"_id":"...", "user":"bar", "value":"a"}, // remove this one
{"_id":"...", "user":"baz", "value":"a"}, // remove this one
{"_id":"...", "user":"qux", "value":"b"}, // this one has unique value so it doesn't get deleted

I would like to find and delete all objects that have duplicate value, except if user is foo . 我想查找并删除所有具有重复值的对象,除非user为foo

Is there JS mongoshell approach for this? 是否有JS mongoshell方法?

Ok this isn't tested but here ya go... This is assuming using Mongoose to interact with the database... 好的,这还没有经过测试,但是可以……这是假设使用Mongoose与数据库进行交互...

let values = [];
let deleteIds = [];

myModel.find({}).then(docs => {
    docs.forEach(d => {
        if (values.indexOf(d.value)) {
            deleteIds.push(d._id);
        } else {
            values.push(d.value);
        }
    })

    deleteIds.forEach(id => {
        myModel.findOneAndRemove({_id: id});
    });
});

I fixed this by using this block of code (this is not full code for this functionality): 我通过使用以下代码块解决了此问题(此功能不是完整的代码):

let query = {
  user:targetedUser
}
let projection = {
  _id:0, id:1, user:1
}


collection.find(query, projection)
      .on('data', doc => {
        collection.deleteMany({id:doc.id, user: {$not: new RegExp(targetedUser)}})
      })
      .on('end', _=> {
        db.close()
      })

Basically targetedUser variable is value of objects that you want to keep while removing all others that are duplicates and do not match that value. 基本上, targetedUser变量是要保留的对象的值,同时删除所有重复且不匹配该值的所有其他对象。 Look at it, remove all duplicates from other users while keeping them for specific user. 查看它,从其他用户中删除所有重复项,同时保留给特定用户。

This is very specific case and might be different for usual problems. 这是非常具体的情况,对于常见问题可能有所不同。 But the point of this answer is that, this code might look like it's gonna eat all the RAM, but it didn't take more than 20MB for 3 million records, also it's fast, compared to other implementations that I've tried so far. 但是,这个答案的重点是,这段代码看起来像是要吃掉所有的RAM,但与300万条记录相比,它占用的内存不会超过20MB,而且与到目前为止我尝试过的其他实现相比,它还是很快的。

This is my take on fetching duplicates in mongoDB. 这是我在mongoDB中获取重复项的观点。 aggregate is helpful function to look into. aggregate是有用的功能。 You can apply multiple pipelines to get to where you want. 您可以应用多个管道到达所需的位置。 aggregate 骨料

  1. match all users that are not equal to foo 匹配所有不等于foo的用户
  2. group them by value , which is going to be the _id and increment the count for each $_id (original) found in the documents set. 将它们按value分组,该值将是_id并增加在文档集中找到的每个$_id (原始)的计数。 Push the items in an array called docIds . 将项目推docIds名为docIds的数组中。
  3. from this new set get all rows/docs that have $count > 1 从这个新集合中获取$ count> 1的所有行/文档
  4. unwind (please check the docs for better explanation) 放松(请检查文档以获得更好的解释)

This will give you documents that has value appeared more than once. 这将为您提供具有多次value文件。 You can then perform the delete operation for these documents, once you are happy with the result set. 对结果集感到满意后,便可以对这些文档执行删除操作。 I haven't manually run this... Let us know.. 我还没有手动运行它...让我们知道..

db.collection.aggregate([{
            $match: {
                "user": {
                    $ne: "foo"
                }
            }
        }, {
            $group: {
                _id: "$value",
                docIds: {
                    $push: "$_id"
                },
                count: {
                    $sum: 1
                }
            }
        }, {
            $match: "$count": {
                $gt: 1
            }
        }, {
            $unwind: $docIds
        }
    ])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM