I have a collection with 2.7million documents. I need to fetch some data based on certain condition. The problem is my query is scanning almost 1 million document to return only 5 documents.
Please help me to optimize this query and what index I should created to minimize the doc scan.
Here is my query
{
"aggregate": "posts",
"pipeline": [
{
"$match": {
"status": "A",
"hashtagIds": {
"$oid": "5d9c866d9f733d2359a3e0e0"
},
"mediaLocation.mediaType": 2,
"mediaLocation.thumbNailPath": {
"$exists": true,
"$ne": null
}
}
},
{
"$lookup": {
"from": "users",
"localField": "userId",
"foreignField": "_id",
"as": "ownerData"
}
},
{
"$unwind": {
"path": "$ownerData",
"preserveNullAndEmptyArrays": true
}
},
{
"$sort": {
"viewsCount": -1
}
},
{
"$limit": 5
}
]
}
A better index and a reordering of the stages should help a great deal.
Index
The current pipeline uses the index on
{
"mediaLocation.mediaType": 1,
status: 1,
genter: 1
}
While this index does support 2 out of the 4 queried fields, it does not support the sort operation, so the query executor must load all of the matching documents into memory and sort them to determine which 5 fields are first.
This query would be served much better by an index that includes all of the queried fields, and the sort field. Note that the equality-matched fields come before the sort field in the index spec:
{
"mediaLocation.mediaType": 1,
status: 1,
hashtagIds: 1,
viewsCount: -1,
"mediaLocation.thumbNailPath"
}
Stage order
In the existing pipeline:
A simple reordering of the fields, along with the above index, would significantly improve performance:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.