数组的 MongoDB 性能问题

Question

I'm looking for some advice on how to improve the performance of my query. 我正在寻找一些关于如何提高查询性能的建议。

I have this user model in mongoose and I'm indexing interested_cities .我在 mongoose 中有这个用户模型，我正在索引Interests_cities 。

 const firebaseToken = new Schema({ type: { type: String, default: "android", required: true, trim: true, }, device_id: { type: String, required: true, trim: true, }, fcm_token: { type: String, required: true, trim: true, }, }); const userSchema = new Schema({ name: { type: String, required: true, trim: true, }, interested_cities: { type: [{ type: String, trim: true, lowercase: true, unique: true }], required: false, default: [], }, push_notification_tokens_firebase: { type: [firebaseToken], required: false, default: [], }, }); userSchema.index({ interested_cities: 1 });

What I'm looking for is to query the users who have 'A' or 'B' in their array of interested_cities .我正在寻找的是查询在他们的interested_cities数组中具有“A”或“B”的用户。

I'm querying something like this.我正在查询这样的事情。 I only need the firebase fcm_token from the query.我只需要查询中的 firebase fcm_token。

 const involvedUsers = await User.find( { $or: [ { interested_cities: { $in: ['A', 'B'] } }, { phone_number: { $in: adminPhoneNumbersList } }, ], }, { _id: 1, "push_notification_tokens_firebase.fcm_token": 1, } );

Currently, the query is taking 20 sec for 14k documents, which needs improvements.目前，查询 14k 文档需要 20 秒，这需要改进。 Any pointers would be appreciated.任何指针将不胜感激。

Explain:解释：

 { "explainVersion": "1", "queryPlanner": { "namespace": "production.users", "indexFilterSet": false, "parsedQuery": { "interested_cities": { "$in": [ "A", "B" ] } }, "maxIndexedOrSolutionsReached": false, "maxIndexedAndSolutionsReached": false, "maxScansToExplodeReached": false, "winningPlan": { "stage": "PROJECTION_DEFAULT", "transformBy": { "_id": 1, "push_notification_tokens_firebase.fcm_token": 1 }, "inputStage": { "stage": "FETCH", "inputStage": { "stage": "IXSCAN", "keyPattern": { "interested_cities": 1 }, "indexName": "interested_cities_1", "isMultiKey": true, "multiKeyPaths": { "interested_cities": [ "interested_cities" ] }, "isUnique": false, "isSparse": false, "isPartial": false, "indexVersion": 2, "direction": "forward", "indexBounds": { "interested_cities": [ "[\"A\", \"A\"]", "[\"B\", \"B\"]" ] } } } }, "rejectedPlans": [] }

 "executionStats": { "executionSuccess": true, "nReturned": 6497, "executionTimeMillis": 48, "totalKeysExamined": 0, "totalDocsExamined": 14827, "executionStages": { "stage": "SUBPLAN", "nReturned": 6497, "executionTimeMillisEstimate": 46, "works": 14829, "advanced": 6497, "needTime": 8331, "needYield": 0, "saveState": 14, "restoreState": 14, "isEOF": 1, "inputStage": { "stage": "PROJECTION_DEFAULT", "nReturned": 6497, "executionTimeMillisEstimate": 46, "works": 14829, "advanced": 6497, "needTime": 8331, "needYield": 0, "saveState": 14, "restoreState": 14, "isEOF": 1, "transformBy": { "_id": 1, "push_notification_tokens_firebase.fcm_token": 1 }, "inputStage": { "stage": "COLLSCAN", "filter": { "$or": [ { "interested_cities": { "$in": [ "A", "B" ] } }, { "phone_number": { "$in": [ "phone numbers", "phone number" ] } } ] }, "nReturned": 6497, "executionTimeMillisEstimate": 41, "works": 14829, "advanced": 6497, "needTime": 8331, "needYield": 0, "saveState": 14, "restoreState": 14, "isEOF": 1, "direction": "forward", "docsExamined": 14827 } } }, "allPlansExecution": [] }

Answer 1

Mongoose optimization:猫鼬优化：

By default, Mongoose queries return an instance of the Mongoose Document class.默认情况下，Mongoose 查询返回 Mongoose Document 类的一个实例。 Documents are much heavier than vanilla JavaScript objects, because they have a lot of internal state for change tracking.文档比普通的 JavaScript 对象重得多，因为它们有很多用于更改跟踪的内部状态。 Enabling the lean option tells Mongoose to skip instantiating a full Mongoose document and just give you the POJO.启用精益选项会告诉 Mongoose 跳过实例化完整的 Mongoose 文档，只给你 POJO。

https://mongoosejs.com/docs/tutorials/lean.html#using-lean https://mongoosejs.com/docs/tutorials/lean.html#using-lean

You can disable this behaviour on per-query basis by appending .lean() at the end.您可以通过在末尾附加.lean()在每个查询的基础上禁用此行为。 If your query is returning "a lot" of documents, this can really improve your speed.如果您的查询返回“大量”文档，这确实可以提高您的速度。 You should read more about lean() from the link above.您应该从上面的链接中了解更多有关 lean() 的信息。

Query optimization:查询优化：

When evaluating the clauses in the $or expression, MongoDB either performs a collection scan or, if all the clauses are supported by indexes, MongoDB performs index scans.在评估 $or 表达式中的子句时，MongoDB 要么执行集合扫描，要么，如果索引支持所有子句，则 MongoDB 执行索引扫描。 That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes.也就是说，为了让 MongoDB 使用索引来评估 $or 表达式，$or 表达式中的所有子句都必须由索引支持。 Otherwise, MongoDB will perform a collection scan.否则，MongoDB 将执行集合扫描。

https://www.mongodb.com/docs/manual/reference/operator/query/or/#-or-clauses-and-indexes https://www.mongodb.com/docs/manual/reference/operator/query/or/#-or-clauses-and-indexes

Query you shared looks like this:您共享的查询如下所示：

const involvedUsers = await User.find({
  $or: [
    { interested_cities: { $in: citiesArr } },
    { phone_number: { $in: phonesArr } },
  ],
}, { _id: 1, "push_notification_tokens_firebase.fcm_token": 1 });

Based on the info above, you need to create following two indexes:根据以上信息，您需要创建以下两个索引：

userSchema.index({ interested_cities: 1 });
userSchema.index({ phone_number: 1 });

This way, mongo will be able to "know" which documents are relevant, find them on disk, extract your projection ("_id" and "push_notification_tokens_firebase.fcm_token") and return it.这样，mongo 将能够“知道”哪些文档是相关的，在磁盘上找到它们，提取您的投影（“_id”和“push_notification_tokens_firebase.fcm_token”）并返回它。

One step further in the optimization would be to create following indexes instead of the ones above:进一步优化的一步是创建以下索引而不是上面的索引：

userSchema.index({ interested_cities: 1, _id: 1, "push_notification_tokens_firebase.fcm_token": 1 });
userSchema.index({ phone_number: 1, _id: 1, "push_notification_tokens_firebase.fcm_token": 1 });

This way, mongo will have all the info it needs to fulfil your query from the indexes, meaning it will never even access the disk to fetch a document.这样，mongo 将拥有从索引中完成查询所需的所有信息，这意味着它甚至永远不会访问磁盘来获取文档。

You can confirm this by running <your-query>.explain('executionStats') and confirming that totalDocsExamined is 0 .您可以通过运行<your-query>.explain('executionStats')并确认totalDocsExamined为0来确认这一点。

Read more about executionStats here: https://www.mongodb.com/docs/manual/reference/explain-results/#mongodb-data-explain.executionStats在此处阅读有关 executionStats 的更多信息：https: //www.mongodb.com/docs/manual/reference/explain-results/#mongodb-data-explain.executionStats

I hope this helps!我希望这有帮助！

数组的 MongoDB 性能问题

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-16 15:45:01

数组的 MongoDB 性能问题

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-16 15:45:01

解决方案1
1 已采纳 2022-05-16 15:45:01