简体   繁体   中英

Mongodb - delete docs from collection that do not have unique value

I have collection of objects such as this:

{"_id":"...", "user":"foo", "value":"a"}, // this one stays coz its user is foo
{"_id":"...", "user":"bar", "value":"a"}, // remove this one
{"_id":"...", "user":"baz", "value":"a"}, // remove this one
{"_id":"...", "user":"qux", "value":"b"}, // this one has unique value so it doesn't get deleted

I would like to find and delete all objects that have duplicate value, except if user is foo .

Is there JS mongoshell approach for this?

Ok this isn't tested but here ya go... This is assuming using Mongoose to interact with the database...

let values = [];
let deleteIds = [];

myModel.find({}).then(docs => {
    docs.forEach(d => {
        if (values.indexOf(d.value)) {
            deleteIds.push(d._id);
        } else {
            values.push(d.value);
        }
    })

    deleteIds.forEach(id => {
        myModel.findOneAndRemove({_id: id});
    });
});

I fixed this by using this block of code (this is not full code for this functionality):

let query = {
  user:targetedUser
}
let projection = {
  _id:0, id:1, user:1
}


collection.find(query, projection)
      .on('data', doc => {
        collection.deleteMany({id:doc.id, user: {$not: new RegExp(targetedUser)}})
      })
      .on('end', _=> {
        db.close()
      })

Basically targetedUser variable is value of objects that you want to keep while removing all others that are duplicates and do not match that value. Look at it, remove all duplicates from other users while keeping them for specific user.

This is very specific case and might be different for usual problems. But the point of this answer is that, this code might look like it's gonna eat all the RAM, but it didn't take more than 20MB for 3 million records, also it's fast, compared to other implementations that I've tried so far.

This is my take on fetching duplicates in mongoDB. aggregate is helpful function to look into. You can apply multiple pipelines to get to where you want. aggregate

  1. match all users that are not equal to foo
  2. group them by value , which is going to be the _id and increment the count for each $_id (original) found in the documents set. Push the items in an array called docIds .
  3. from this new set get all rows/docs that have $count > 1
  4. unwind (please check the docs for better explanation)

This will give you documents that has value appeared more than once. You can then perform the delete operation for these documents, once you are happy with the result set. I haven't manually run this... Let us know..

db.collection.aggregate([{
            $match: {
                "user": {
                    $ne: "foo"
                }
            }
        }, {
            $group: {
                _id: "$value",
                docIds: {
                    $push: "$_id"
                },
                count: {
                    $sum: 1
                }
            }
        }, {
            $match: "$count": {
                $gt: 1
            }
        }, {
            $unwind: $docIds
        }
    ])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM