猫鼬在提取信息的同时更新数百万条记录

Question

We have a production database with over 5 million customer customer records, each customer document has an embedded array of licenses they have applied for.我们有一个包含超过 500 万条客户记录的生产数据库，每个客户文档都嵌入了他们申请的许可证数组。 And example customer document is as follows:示例客户文档如下：

{
    _id: ObjectId('...'),
    phoneNumber: 'xxxx',
    // Other customer fields
    licenses: [
        {
            _id: ObjectId('...'),
            state: 'PENDING',
            expired: false,
            createdAt: ISODate(''),
            // Other license fields
        },
        // More Licenses for this customer
    ]
}

I have been tasked with changing the state of every PENDING license applied for during the month of September to REJECTED and sending an SMS to every customer whose pending permit just got rejected.我的任务是将9月份申请的每个PENDING许可证的状态更改为REJECTED ，并向每个待处理许可证刚刚被拒绝的客户发送短信。

Using the model.where(condition).countDocuments() I have found that there is over 3 million customers (not licenses) matching the aforementioned criteria.使用model.where(condition).countDocuments()我发现有超过 300 万客户（不是许可证）符合上述标准。 Each customer has an average of 9 licenses.每个客户平均拥有 9 个许可证。

I need assistance coming up with a strategy that won't slow down the system when performing this action.我需要帮助提出一个在执行此操作时不会减慢系统速度的策略。 Furthermore, this is around 17GB of data.此外，这大约是17GB的数据。

Sending SMS is fine, I can queue details for SMS service.发送短信很好，我可以为短信服务排队详细信息。 My challenge is processing the licenses while extracting relevant information for SMS.我的挑战是在为 SMS 提取相关信息的同时处理许可证。

Answer 1

First of all you have to create an index on the collection:首先，您必须在集合上创建索引：

db.collection.createIndex( { "licenses.state": 1 } )

Then you shoud do something like that:然后你应该做这样的事情：

model.updateMany({}, {
    '$set': {
        'licenses.$[elem].state': 'REJECTED'
    }
}, { arrayFilters: [{
        'elem.createdAt': { $gte: ISODate(....) }
    }],
    multi: true
} ).then(function (doc)){}

If you have a replica set and your updates are on the primary instance you should not affect the secondary instances when reading on those once.如果您有一个副本集并且您的更新在主实例上，则在读取次实例时不应影响辅助实例。

If you want to split the update on many batches you can use the _id (already indexed).如果您想将更新拆分为多个批次，您可以使用 _id（已编入索引）。 Of course it depends on your _id format.当然，这取决于您的 _id 格式。

猫鼬在提取信息的同时更新数百万条记录

问题描述

1 个解决方案

解决方案1
2 2020-10-03 21:05:46

猫鼬在提取信息的同时更新数百万条记录

问题描述

1 个解决方案

解决方案1 2 2020-10-03 21:05:46

解决方案1
2 2020-10-03 21:05:46