简体   繁体   中英

Indexing not utilized during the MongoDB aggregation query

I have stuck somewhere in MongoDB aggregate query. I tried to generate a summary report from the database which contains 110M records. during the report generation, I faced the following issues 1).Even though the collection is indexed they are not utilized for the search. 2).Once query execution finished memory of DB server not decreased. 3)query take considerable time to return the result.

im useing mongodb Atlas v4.2.8 sample document

{
    "_id": {
        "$oid": "5eb122f714d0510011e3a184"
    },
    "from": "Star_friends",
    "to": "94713414047",
    "accountName": "ZM",
    "accountId": "ZM",
    "campaignName": "test 1",
    "campaignId": "5eb122f1e921c3001922f73c",
    "campaignType": "BULK",
    "status": {
        "$numberInt": "3"
    },
    "reason": "No Routing",
    "channel": "sms",
    "messageType": {
        "$numberInt": "1"
    },
    "event": "MT",
    "content": "test 132",
    "credit": {
        "$numberInt": "1"
    },
    "msgId": "",
    "createdDateTime": "2020-05-05T13:55:27.743Z",
    "updatedTime": "2020-05-05T13:55:27.745Z",
    "uDate": "2020-05-05",
    "operator": "mobitel"
}

my query as follows

db.getCollection('report').aggregate([{
    "$match": {
        "createdDateTime": {
            "$gt": "2020-09-14T00:00:01.000Z",
            "$lt": "2020-09-15T23:59:99.999Z"
        },
        "messageType": {
            "$in": [1, 2]
        },
        "channel": {
            "$in": ["sms", "viber", "whatsapp"]
        },
        "accountId": {
            "$in": ["ZM", "KEELLS"]
        }
    }
}, {
    "$project": {
        "_id": 0,
        "channel": 1,
        "messageType": 1,
        "accountName": 1,
        "accountId": 1,
        "createdDateTime": 1,
        "uDate": 1,
        "credit": 1,
        "status": 1
    }
}, {
    "$group": {
        "_id": {
            "channel": "$channel",
            "messageType": "$messageType",
            "accountName": "$accountName",
            "accountId": "$accountId",
            "filteredDate": {
                "$substr": ["$createdDateTime", 0, 7]
            },
            "sortDate": "$uDate"
        },
        "total": {
            "$sum": "$credit"
        },
        "send": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [2, 15, 1, 14, 6, 17, 4, 5]]
                }, "$credit", 0]
            }
        },
        "delivered": {
            "$sum": {
                "$cond": [{
                        "$in": ["$status", [6, 17, 4]]
                    },
                    "$credit",
                    0
                ]
            }
        },
        "deliveryFailed": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [12, 5]]
                }, "$credit", 0]
            }
        },
        "failed": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [3]]
                }, "$credit", 0]
            }
        },
        "datass": {
            "$addToSet": {
                "channel": "$channel",
                "messageType": "$messageType",
                "accountName": "$accountName",
                "accountId": "$accountId",
                "filteredDate": {
                    "$substr": ["$createdDateTime", 0, 7]
                },
                "sortDate": "$uDate"
            }
        }
    }
}, {
    "$unwind": "$datass"
}, {
    "$project": {
        "_id": 0
    }
}, {
    "$sort": {
        "datass.sortDate": -1
    }
}])

indexes as follows

accountId_1 / accountId_1_createdDateTime_-1 / campaignId_-1 / channel_1 / createdDateTime_-1 / messageType_1 / msgId_-1 / msgId_-1_status_1

I would be appreciated if someone can help me with this

Thanks

You gave us little information. How many documents should average query like such return? How long does it take to execute the said query?

What I can see here is that your match pipeline is good, because you are trying to filter out documents by fields that are indexed. But what is a "performance smell" here is your $sort function which does sorting on non-indexed field. Try to do sorting immediately after $match.

Play with it a little more and try to figure out which stage of the pipeline is a performance bottle-neck.

I have resolved my issue by changing my indexes

accountId_1_createdDateTime_-1 / msgId_-1_status_1 / accountId_1_messageType_1_channel_1_createdDateTime_1_accountName_1_uDate_1_credit_1_status_1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM