简体   繁体   English

在 MongoDB 聚合查询期间未使用索引

[英]Indexing not utilized during the MongoDB aggregation query

I have stuck somewhere in MongoDB aggregate query.我被困在 MongoDB 聚合查询的某个地方。 I tried to generate a summary report from the database which contains 110M records.我试图从包含 110M 记录的数据库中生成一个摘要报告。 during the report generation, I faced the following issues 1).Even though the collection is indexed they are not utilized for the search.在报告生成过程中,我遇到了以下问题 1).即使集合被索引,它们也不会用于搜索。 2).Once query execution finished memory of DB server not decreased. 2).一旦查询执行完成,数据库服务器的内存不会减少。 3)query take considerable time to return the result. 3)查询需要相当长的时间才能返回结果。

im useing mongodb Atlas v4.2.8 sample document我使用的是 mongodb Atlas v4.2.8 示例文档

{
    "_id": {
        "$oid": "5eb122f714d0510011e3a184"
    },
    "from": "Star_friends",
    "to": "94713414047",
    "accountName": "ZM",
    "accountId": "ZM",
    "campaignName": "test 1",
    "campaignId": "5eb122f1e921c3001922f73c",
    "campaignType": "BULK",
    "status": {
        "$numberInt": "3"
    },
    "reason": "No Routing",
    "channel": "sms",
    "messageType": {
        "$numberInt": "1"
    },
    "event": "MT",
    "content": "test 132",
    "credit": {
        "$numberInt": "1"
    },
    "msgId": "",
    "createdDateTime": "2020-05-05T13:55:27.743Z",
    "updatedTime": "2020-05-05T13:55:27.745Z",
    "uDate": "2020-05-05",
    "operator": "mobitel"
}

my query as follows我的查询如下

db.getCollection('report').aggregate([{
    "$match": {
        "createdDateTime": {
            "$gt": "2020-09-14T00:00:01.000Z",
            "$lt": "2020-09-15T23:59:99.999Z"
        },
        "messageType": {
            "$in": [1, 2]
        },
        "channel": {
            "$in": ["sms", "viber", "whatsapp"]
        },
        "accountId": {
            "$in": ["ZM", "KEELLS"]
        }
    }
}, {
    "$project": {
        "_id": 0,
        "channel": 1,
        "messageType": 1,
        "accountName": 1,
        "accountId": 1,
        "createdDateTime": 1,
        "uDate": 1,
        "credit": 1,
        "status": 1
    }
}, {
    "$group": {
        "_id": {
            "channel": "$channel",
            "messageType": "$messageType",
            "accountName": "$accountName",
            "accountId": "$accountId",
            "filteredDate": {
                "$substr": ["$createdDateTime", 0, 7]
            },
            "sortDate": "$uDate"
        },
        "total": {
            "$sum": "$credit"
        },
        "send": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [2, 15, 1, 14, 6, 17, 4, 5]]
                }, "$credit", 0]
            }
        },
        "delivered": {
            "$sum": {
                "$cond": [{
                        "$in": ["$status", [6, 17, 4]]
                    },
                    "$credit",
                    0
                ]
            }
        },
        "deliveryFailed": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [12, 5]]
                }, "$credit", 0]
            }
        },
        "failed": {
            "$sum": {
                "$cond": [{
                    "$in": ["$status", [3]]
                }, "$credit", 0]
            }
        },
        "datass": {
            "$addToSet": {
                "channel": "$channel",
                "messageType": "$messageType",
                "accountName": "$accountName",
                "accountId": "$accountId",
                "filteredDate": {
                    "$substr": ["$createdDateTime", 0, 7]
                },
                "sortDate": "$uDate"
            }
        }
    }
}, {
    "$unwind": "$datass"
}, {
    "$project": {
        "_id": 0
    }
}, {
    "$sort": {
        "datass.sortDate": -1
    }
}])

indexes as follows索引如下

accountId_1 / accountId_1_createdDateTime_-1 / campaignId_-1 / channel_1 / createdDateTime_-1 / messageType_1 / msgId_-1 / msgId_-1_status_1 accountId_1 / accountId_1_createdDateTime_-1 / campaignId_-1 / channel_1 / createdDateTime_-1 / messageType_1 / msgId_-1 / msgId_-1_status_1

I would be appreciated if someone can help me with this如果有人可以帮助我,我将不胜感激

Thanks谢谢

You gave us little information.你给我们的信息很少。 How many documents should average query like such return?像这样的平均查询应该返回多少个文档? How long does it take to execute the said query?执行上述查询需要多长时间?

What I can see here is that your match pipeline is good, because you are trying to filter out documents by fields that are indexed.我在这里看到的是您的匹配管道很好,因为您正在尝试按索引字段过滤掉文档。 But what is a "performance smell" here is your $sort function which does sorting on non-indexed field.但是这里的“性能气味”是您的 $sort 函数,它对非索引字段进行排序。 Try to do sorting immediately after $match.尝试在 $match 后立即进行排序。

Play with it a little more and try to figure out which stage of the pipeline is a performance bottle-neck.多玩一点,并尝试找出管道的哪个阶段是性能瓶颈。

I have resolved my issue by changing my indexes我已通过更改索引解决了我的问题

accountId_1_createdDateTime_-1 / msgId_-1_status_1 / accountId_1_messageType_1_channel_1_createdDateTime_1_accountName_1_uDate_1_credit_1_status_1 accountId_1_createdDateTime_-1 / msgId_-1_status_1 / accountId_1_messageType_1_channel_1_createdDateTime_1_accountName_1_uDate_1_credit_1_status_1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM