I have collection of objects such as this:
{"_id":"...", "user":"foo", "value":"a"}, // this one stays coz its user is foo
{"_id":"...", "user":"bar", "value":"a"}, // remove this one
{"_id":"...", "user":"baz", "value":"a"}, // remove this one
{"_id":"...", "user":"qux", "value":"b"}, // this one has unique value so it doesn't get deleted
I would like to find and delete all objects that have duplicate value, except if user is foo
.
Is there JS mongoshell approach for this?
Ok this isn't tested but here ya go... This is assuming using Mongoose to interact with the database...
let values = [];
let deleteIds = [];
myModel.find({}).then(docs => {
docs.forEach(d => {
if (values.indexOf(d.value)) {
deleteIds.push(d._id);
} else {
values.push(d.value);
}
})
deleteIds.forEach(id => {
myModel.findOneAndRemove({_id: id});
});
});
I fixed this by using this block of code (this is not full code for this functionality):
let query = {
user:targetedUser
}
let projection = {
_id:0, id:1, user:1
}
collection.find(query, projection)
.on('data', doc => {
collection.deleteMany({id:doc.id, user: {$not: new RegExp(targetedUser)}})
})
.on('end', _=> {
db.close()
})
Basically targetedUser
variable is value of objects that you want to keep while removing all others that are duplicates and do not match that value. Look at it, remove all duplicates from other users while keeping them for specific user.
This is very specific case and might be different for usual problems. But the point of this answer is that, this code might look like it's gonna eat all the RAM, but it didn't take more than 20MB for 3 million records, also it's fast, compared to other implementations that I've tried so far.
This is my take on fetching duplicates in mongoDB. aggregate
is helpful function to look into. You can apply multiple pipelines to get to where you want. aggregate
value
, which is going to be the _id
and increment the count for each $_id
(original) found in the documents set. Push the items in an array called docIds
. This will give you documents that has value
appeared more than once. You can then perform the delete operation for these documents, once you are happy with the result set. I haven't manually run this... Let us know..
db.collection.aggregate([{
$match: {
"user": {
$ne: "foo"
}
}
}, {
$group: {
_id: "$value",
docIds: {
$push: "$_id"
},
count: {
$sum: 1
}
}
}, {
$match: "$count": {
$gt: 1
}
}, {
$unwind: $docIds
}
])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.