[英]How can I increase Mongoose/MongoDB create and update performance for a large number of entries
I've got an Express application that's using Mongoose/MongoDB and am hoping to find the most efficient way to create/update in bulk (all in a single database operation if possible?).我有一个使用 Mongoose/MongoDB 的 Express 应用程序,我希望找到批量创建/更新的最有效方法(如果可能,所有这些都在单个数据库操作中?)。
Users upload a CSV on the frontend that is converted to a JSON array of objects and sent to an Express backend.用户在前端上传 CSV,该 ZCC8D68C1D7A287401D148A23BBD7A2F8Z 对象数组并发送到 Express 后端。 The array ranges anywhere from ~3000 entries to upwards of ~50,000 and is often a combination of new entries that need to be created as well as existing entries that need to be updated.
该数组的范围从约 3000 个条目到约 50,000 个以上,并且通常是需要创建的新条目以及需要更新的现有条目的组合。 Each entry is called a Deal.
每个条目称为一个交易。
Here is my current (not very performant) solution:这是我当前(不是很高效)的解决方案:
const deals = [
{ deal_id: '887713', foo: 'data', bar: 'data' },
{ deal_id: '922257', foo: 'data', bar: 'data' }
] // each deal contains 5 key/value pairs in the real data array
const len = deals.length
const Model = models.Deal
let created = 0
let updated = 0
let errors = 0
for (let i = 0; i < len; i++) {
const deal = deals[i]
const exists = await Model.findOne({ deal_id: deal.deal_id })
if (exists) {
exists.foo = deal.foo
exists.bar = deal.bar
await exists.save()
updated += 1
} else {
try {
await Model.create(deal)
created += 1
} catch (e) {
errors += 1
}
}
}
Currently the combination of findOne/save or findOne/create is taking approximately 200-300ms for every Deal.目前 findOne/save 或 findOne/create 的组合对于每个交易大约需要 200-300 毫秒。 For the low end of 3000 entries, that results in 10-15 minutes to process.
对于 3000 个条目的低端,这将导致 10-15 分钟的处理时间。
I'm not impartial to circumventing Mongoose and using Mongo directly if that helps.如果有帮助,我对绕过 Mongoose 并直接使用 Mongo 并不公正。
If possible, I'd like maintain the ability to count the number items that were updated and created as well as the number of errors (this is sent in the response to offer users some feeling of what was successful and what failed) - but this is not critical.如果可能的话,我想保持计算更新和创建的项目数量以及错误数量的能力(这是在响应中发送给用户一些成功和失败的感觉) - 但是这个并不重要。
Thanks in advance: :)提前致谢: :)
You'd want to do this with as few database requests as possible.您希望使用尽可能少的数据库请求来执行此操作。 First, you can fetch all relevant documents in one
find
statement.首先,您可以在一个
find
语句中获取所有相关文档。https://docs.mongodb.com/manual/reference/operator/query/in/https://docs.mongodb.com/manual/reference/operator/query/in/
const deals = [
{ deal_id: '887713', foo: 'data', bar: 'data' },
{ deal_id: '922257', foo: 'data', bar: 'data' }
]
const ids = deals.map(deal => deal.deal_id) // An array of all deal_id
const documents = await Model.find({ deal_id: { $in: ids }})
Now we'll make one query to update everything with the property upsert
set to true
.现在我们将进行一个查询来更新所有属性
upsert
设置为true
。 https://docs.mongodb.com/manual/reference/method/db.collection.update/ This will make sure that if the document does not already exist, then create it automatically. https://docs.mongodb.com/manual/reference/method/db.collection.update/这将确保如果文档不存在,则自动创建它。
By bulk updating (updating many at the same time) the most efficient approach is to bypass mongoose and use the mongodb driver directly with the command bulkWrite
.通过批量更新(同时更新多个),最有效的方法是绕过 mongoose 并通过命令
bulkWrite
直接使用 mongodb 驱动程序。 https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/ https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/
const operations = deals.map(deal => {
updateOne: {
filter: {
deal_id: deal.deal_id
},
update: {
$set: deal
},
upsert: true
}
})
const result = await Model.collection.bulkWrite(operations, { ordered: false })
Above i also set { ordered: false }
which just tells MongoDB to "Just insert as fast as possible without regards to the order of the array i just gave you".上面我还设置了
{ ordered: false }
,它只是告诉 MongoDB “尽可能快地插入,而不考虑我刚刚给你的数组的顺序”。 It also continues to insert the rest of the documents, even if one fails.它还继续插入文档的 rest,即使一个失败。 Also stated under the bulkWrite documentation page.
在 bulkWrite 文档页面下也有说明。
The result object from a bulkWrite looks like this来自 bulkWrite 的结果 object 看起来像这样
{
"acknowledged" : true,
"deletedCount" : 1,
"insertedCount" : 2,
"matchedCount" : 2,
"upsertedCount" : 0,
"insertedIds" : {
"0" : 4,
"1" : 5
},
"upsertedIds" : {
}
}
This means that you'll get a list of how many matches you got, how many of those was updated, and which documents were created (upsertedIds).这意味着您将获得一个列表,其中包含您获得的匹配数量、更新的匹配数量以及创建了哪些文档 (upsertedIds)。 This is also stated in the documentation for bulkWrite.
bulkWrite 的文档中也说明了这一点。
A good practice on large data sets is to chunk the bulkWrite into lesser operations arrays to increase performance.大型数据集的一个好的做法是将 bulkWrite 分块为较小的操作 arrays 以提高性能。 A small-medium MongoDB server should be fine with a few thousand documents at the same time.
中小型 MongoDB 服务器应该可以同时处理几千个文档。
Please note that none of the code examples is tested.请注意,没有任何代码示例经过测试。 But the goal was to point you into the right direction and to understand some good practices.
但目标是为您指明正确的方向并了解一些良好做法。 Best of luck!
祝你好运!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.