I have stuck somewhere in MongoDB aggregate query. I tried to generate a summary report from the database which contains 110M records. during the report generation, I faced the following issues 1).Even though the collection is indexed they are not utilized for the search. 2).Once query execution finished memory of DB server not decreased. 3)query take considerable time to return the result.
im useing mongodb Atlas v4.2.8 sample document
{
"_id": {
"$oid": "5eb122f714d0510011e3a184"
},
"from": "Star_friends",
"to": "94713414047",
"accountName": "ZM",
"accountId": "ZM",
"campaignName": "test 1",
"campaignId": "5eb122f1e921c3001922f73c",
"campaignType": "BULK",
"status": {
"$numberInt": "3"
},
"reason": "No Routing",
"channel": "sms",
"messageType": {
"$numberInt": "1"
},
"event": "MT",
"content": "test 132",
"credit": {
"$numberInt": "1"
},
"msgId": "",
"createdDateTime": "2020-05-05T13:55:27.743Z",
"updatedTime": "2020-05-05T13:55:27.745Z",
"uDate": "2020-05-05",
"operator": "mobitel"
}
my query as follows
db.getCollection('report').aggregate([{
"$match": {
"createdDateTime": {
"$gt": "2020-09-14T00:00:01.000Z",
"$lt": "2020-09-15T23:59:99.999Z"
},
"messageType": {
"$in": [1, 2]
},
"channel": {
"$in": ["sms", "viber", "whatsapp"]
},
"accountId": {
"$in": ["ZM", "KEELLS"]
}
}
}, {
"$project": {
"_id": 0,
"channel": 1,
"messageType": 1,
"accountName": 1,
"accountId": 1,
"createdDateTime": 1,
"uDate": 1,
"credit": 1,
"status": 1
}
}, {
"$group": {
"_id": {
"channel": "$channel",
"messageType": "$messageType",
"accountName": "$accountName",
"accountId": "$accountId",
"filteredDate": {
"$substr": ["$createdDateTime", 0, 7]
},
"sortDate": "$uDate"
},
"total": {
"$sum": "$credit"
},
"send": {
"$sum": {
"$cond": [{
"$in": ["$status", [2, 15, 1, 14, 6, 17, 4, 5]]
}, "$credit", 0]
}
},
"delivered": {
"$sum": {
"$cond": [{
"$in": ["$status", [6, 17, 4]]
},
"$credit",
0
]
}
},
"deliveryFailed": {
"$sum": {
"$cond": [{
"$in": ["$status", [12, 5]]
}, "$credit", 0]
}
},
"failed": {
"$sum": {
"$cond": [{
"$in": ["$status", [3]]
}, "$credit", 0]
}
},
"datass": {
"$addToSet": {
"channel": "$channel",
"messageType": "$messageType",
"accountName": "$accountName",
"accountId": "$accountId",
"filteredDate": {
"$substr": ["$createdDateTime", 0, 7]
},
"sortDate": "$uDate"
}
}
}
}, {
"$unwind": "$datass"
}, {
"$project": {
"_id": 0
}
}, {
"$sort": {
"datass.sortDate": -1
}
}])
indexes as follows
accountId_1 / accountId_1_createdDateTime_-1 / campaignId_-1 / channel_1 / createdDateTime_-1 / messageType_1 / msgId_-1 / msgId_-1_status_1
I would be appreciated if someone can help me with this
Thanks
You gave us little information. How many documents should average query like such return? How long does it take to execute the said query?
What I can see here is that your match pipeline is good, because you are trying to filter out documents by fields that are indexed. But what is a "performance smell" here is your $sort function which does sorting on non-indexed field. Try to do sorting immediately after $match.
Play with it a little more and try to figure out which stage of the pipeline is a performance bottle-neck.
I have resolved my issue by changing my indexes
accountId_1_createdDateTime_-1 / msgId_-1_status_1 / accountId_1_messageType_1_channel_1_createdDateTime_1_accountName_1_uDate_1_credit_1_status_1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.