MongoDB-聚合性能調整

Question

我的聚合管道之一運行得相當慢。

關於收藏

該集合的名稱為“ Document ，每個文檔可以屬於多個戰役，並且位於“ a”至“ e”這五個雕像之一中。 一小部分文檔可能不屬於任何文檔，並且其campaigns字段設置為null 。

樣本文件：

{_id:id,  campaigns:['c1', 'c2], status:'a', ...other fields...}

一些收款統計

文件數量：僅200萬個：(
大小：2GB
平均文檔大小：980字節。
儲存大小：780MB
總索引大小：134MB
索引數：12
文檔中的字段數：30-40，可能包含數組或對象。

關於查詢

如果查詢的狀態為['a'，'b'，'c']，則該查詢的目標是按狀態計算每個廣告系列的文檔數

[
    {$match:{campaigns:{$ne:null}, status:{$in:['a','b','c']}}},
    {$unwind:'$campaigns'},
    {$group:{_id:{campaign:'$campaigns', status:'$status'}, total:{$sum:1}}}
]

預計聚合將影響幾乎整個集合。 如果沒有索引，則聚合大約需要8 seconds才能完成。

我試圖在創建索引

{campaings:1, status:1}

解釋計划顯示已掃描索引，但聚集花費了near 11 seconds才能完成。

題

索引包括聚合進行計數所需的所有字段。 聚合不應該只按索引嗎？ 該索引只有10MB。 怎么會慢呢？ 如果沒有索引，還有其他建議來優化查詢嗎？

獲獎計划顯示：

{
    "stage" : "FETCH",
    "filter" : {"$not" : {"campaigns" : {"$eq" : null}}},
    "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {"campaigns" : 1.0,"status" : 1.0},
        "indexName" : "campaigns_1_status_1",
        "isMultiKey" : true,
        "isUnique" : false,
        "isSparse" : false,
        "isPartial" : false,
        "indexVersion" : 1,
        "direction" : "forward",
        "indexBounds" : {
            "campaigns" : ["[MinKey, null)", "(null, MaxKey]"],
            "status" : [ "[\"a\", \"a\"]", "[\"b\", \"b\"]", "[\"c\", \"c\"]"]
        }
    }
}

如果沒有索引，則中獎計划：

{
    "stage" : "COLLSCAN",
    "filter" : {
        "$and":[
            {"status": {"$in": ["a", "b", "c"]}},
            {"$not" : {"campaigns": {"$eq" : null}}}
        ]
    },
    direction" : "forward"
}

更新

按照@Kevin的要求，以下是有關所有其他索引的詳細信息，以MB為單位。

"indexSizes" : {
    "_id_" : 32,
    "team_1" : 8, //Single value field of ObjectId
    "created_time_1" : 16, //Document publish time in source system.
    "parent_1" : 2, //_id of parent document. 
    "by.id_1" : 13, //_id of author from a different collection. 
    "feedids_1" : 8, //Array, _id of ETL jobs contributing to sync of this doc.
    "init_-1" : 2, //Initial load time of the doc.
    "campaigns_1" : 10, //Array, _id of campaigns
    "last_fetch_-1" : 13, //Last sync time of the doc. 
    "categories_1" : 8, //Array, _id of document categories. 
    "status_1" : 8, //Status
    "campaigns_1_status_1" : 10 //Combined index of campaign _id and status. 
},

Answer 1

從MongoDB閱讀文檔后，我發現了這一點：

不等式運算符$ ne的選擇性不是很高，因為它通常與索引的很大一部分匹配。 結果，在許多情況下，帶有索引的$ ne查詢的性能可能不比必須掃描集合中所有文檔的$ ne查詢更好。 另請參閱查詢選擇性。

使用$ type運算符查看一些不同的文章可能會解決此問題。

您可以使用以下查詢：

db.data.aggregate([
    {$match:{campaigns:{$type:2},status:{$in:["a","b","c"]}}},
    {$unwind:'$campaigns'},
    {$group:{_id:{campaign:'$campaigns', status:'$status'}, total:{$sum:1}}}])

MongoDB-聚合性能調整

問題描述

關於收藏

關於查詢

題

更新

1 個解決方案

解決方案1
0 2016-05-24 11:36:16

MongoDB-聚合性能調整

問題描述

關於收藏

關於查詢

題

更新

1 個解決方案

解決方案1 0 2016-05-24 11:36:16

解決方案1
0 2016-05-24 11:36:16