简体   繁体   中英

Mongo Aggregate Query Optimization

I have a collection with 2.7million documents. I need to fetch some data based on certain condition. The problem is my query is scanning almost 1 million document to return only 5 documents.

Please help me to optimize this query and what index I should created to minimize the doc scan.

Here is my query

{
"aggregate": "posts",
    "pipeline": [
      {
        "$match": {
          "status": "A",
          "hashtagIds": {
            "$oid": "5d9c866d9f733d2359a3e0e0"
          },
          "mediaLocation.mediaType": 2,
          "mediaLocation.thumbNailPath": {
            "$exists": true,
            "$ne": null
          }
        }
      },
      {
        "$lookup": {
          "from": "users",
          "localField": "userId",
          "foreignField": "_id",
          "as": "ownerData"
        }
      },
      {
        "$unwind": {
          "path": "$ownerData",
          "preserveNullAndEmptyArrays": true
        }
      },
      {
        "$sort": {
          "viewsCount": -1
        }
      },
      {
        "$limit": 5
      }
    ]
}

A better index and a reordering of the stages should help a great deal.

Index

The current pipeline uses the index on

{
  "mediaLocation.mediaType": 1,
  status: 1,
  genter: 1
}

While this index does support 2 out of the 4 queried fields, it does not support the sort operation, so the query executor must load all of the matching documents into memory and sort them to determine which 5 fields are first.

This query would be served much better by an index that includes all of the queried fields, and the sort field. Note that the equality-matched fields come before the sort field in the index spec:

{
  "mediaLocation.mediaType": 1,
  status: 1,
  hashtagIds: 1,
  viewsCount: -1,
  "mediaLocation.thumbNailPath"
}

Stage order

In the existing pipeline:

  • $match: all 856k matching documents are retrieved
  • $lookup: 856k queries are executed against the users collection
  • $unwind: 856k array fields converted to object
  • $sort: in-memory sort of 856k documents
  • $limit: return the first 5 documents

A simple reordering of the fields, along with the above index, would significantly improve performance:

  • $match:
  • $sort:
  • $limit:
    If the above index exists, placing these stages first allows the query planner to combine these 3 stages into one, identifying fields in pre-sorted order using the index, and stopping as soon as 5 matches are found. The combined stage will read 5 documents, plus the index keys
  • $lookup: executes 5 queries in the user collection
  • $unwind: convert 5 arrays to object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM