[英]Enforce limit on mongodb bulk API
I'd like to delete a large number of old documents from one collection and so it makes sense to use the bulk api. 我想从一个集合中删除大量旧文档,因此使用批量api是有意义的。 Deleting them is as simple as:
删除它们很简单:
var bulk = db.myCollection.initializeUnorderedBulkOp();
bulk.find({
_id: {
$lt: oldestAllowedId
}
}).remove();
bulk.execute();
The only problem is this will attempt to delete every single document matching this criteria and in this case that is millions of documents, so for performance reasons I don't want to delete them all at once. 唯一的问题是这将尝试删除符合此条件的每个文档,在这种情况下是数百万个文档,因此出于性能原因,我不想一次删除它们。 I want to enforce a limit on the operation so that I can do something like
bulk.limit(10000).execute();
我想对操作强制执行限制,以便我可以执行诸如
bulk.limit(10000).execute();
and space the operations out by a few seconds to prevent locking the database for longer than necessary. 并将操作间隔几秒钟,以防止将数据库锁定的时间超过必要的时间。 However I have been unable to find any options that can be passed to bulk for limiting the number it executes.
但是,我无法找到任何可以传递给批量的选项来限制它执行的数量。
Is there a way to limit bulk operations in this manner? 有没有办法以这种方式限制批量操作?
Before anyone mentions it, I know that bulk will split operations into 1000 document chunks automatically, but it will still execute all of those operations sequentially as fast as it can. 在任何人提到它之前,我知道批量会自动将操作拆分为1000个文档块,但它仍然会尽可能快地按顺序执行所有这些操作。 This results in a much larger performance impact than I can deal with right now.
这导致了比我现在可以处理的更大的性能影响。
You can iterate the array of _id
that of those documents that match your query using the .forEach
method. 您可以使用
.forEach
方法迭代与您的查询匹配的那些文档的_id
数组。 The best way to return that array is by using the .distinct()
method. 返回该数组的最佳方法是使用
.distinct()
方法。 You then use "bulk" operations to remove your documents. 然后使用“批量”操作删除文档。
var bulk = db.myCollection.initializeUnorderedBulkOp();
var count = 0;
var ids = db.myCollection.distinct('_id', { '_id': { '$lt': oldestAllowedId } } );
ids.forEach(function(id) {
bulk.find( { '_id': id } ).removeOne();
count++;
if (count % 1000 === 0) {
// Execute per 1000 operations and re-init
bulk.execute();
// Here you can sleep for a while
bulk = db.myCollection.initializeUnorderedBulkOp();
}
});
// clean up queues
if (count > 0 ) {
bulk.execute();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.