[英]Mongodb - delete docs from collection that do not have unique value
I have collection of objects such as this: 我有这样的对象的集合:
{"_id":"...", "user":"foo", "value":"a"}, // this one stays coz its user is foo
{"_id":"...", "user":"bar", "value":"a"}, // remove this one
{"_id":"...", "user":"baz", "value":"a"}, // remove this one
{"_id":"...", "user":"qux", "value":"b"}, // this one has unique value so it doesn't get deleted
I would like to find and delete all objects that have duplicate value, except if user is foo
. 我想查找并删除所有具有重复值的对象,除非user为
foo
。
Is there JS mongoshell approach for this? 是否有JS mongoshell方法?
Ok this isn't tested but here ya go... This is assuming using Mongoose to interact with the database... 好的,这还没有经过测试,但是可以……这是假设使用Mongoose与数据库进行交互...
let values = [];
let deleteIds = [];
myModel.find({}).then(docs => {
docs.forEach(d => {
if (values.indexOf(d.value)) {
deleteIds.push(d._id);
} else {
values.push(d.value);
}
})
deleteIds.forEach(id => {
myModel.findOneAndRemove({_id: id});
});
});
I fixed this by using this block of code (this is not full code for this functionality): 我通过使用以下代码块解决了此问题(此功能不是完整的代码):
let query = {
user:targetedUser
}
let projection = {
_id:0, id:1, user:1
}
collection.find(query, projection)
.on('data', doc => {
collection.deleteMany({id:doc.id, user: {$not: new RegExp(targetedUser)}})
})
.on('end', _=> {
db.close()
})
Basically targetedUser
variable is value of objects that you want to keep while removing all others that are duplicates and do not match that value. 基本上,
targetedUser
变量是要保留的对象的值,同时删除所有重复且不匹配该值的所有其他对象。 Look at it, remove all duplicates from other users while keeping them for specific user. 查看它,从其他用户中删除所有重复项,同时保留给特定用户。
This is very specific case and might be different for usual problems. 这是非常具体的情况,对于常见问题可能有所不同。 But the point of this answer is that, this code might look like it's gonna eat all the RAM, but it didn't take more than 20MB for 3 million records, also it's fast, compared to other implementations that I've tried so far.
但是,这个答案的重点是,这段代码看起来像是要吃掉所有的RAM,但与300万条记录相比,它占用的内存不会超过20MB,而且与到目前为止我尝试过的其他实现相比,它还是很快的。
This is my take on fetching duplicates in mongoDB. 这是我在mongoDB中获取重复项的观点。
aggregate
is helpful function to look into. aggregate
是有用的功能。 You can apply multiple pipelines to get to where you want. 您可以应用多个管道到达所需的位置。 aggregate
骨料
value
, which is going to be the _id
and increment the count for each $_id
(original) found in the documents set. value
分组,该值将是_id
并增加在文档集中找到的每个$_id
(原始)的计数。 Push the items in an array called docIds
. docIds
名为docIds
的数组中。 This will give you documents that has value
appeared more than once. 这将为您提供具有多次
value
文件。 You can then perform the delete operation for these documents, once you are happy with the result set. 对结果集感到满意后,便可以对这些文档执行删除操作。 I haven't manually run this... Let us know..
我还没有手动运行它...让我们知道..
db.collection.aggregate([{
$match: {
"user": {
$ne: "foo"
}
}
}, {
$group: {
_id: "$value",
docIds: {
$push: "$_id"
},
count: {
$sum: 1
}
}
}, {
$match: "$count": {
$gt: 1
}
}, {
$unwind: $docIds
}
])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.