MongoDB嵌套對象聚合計數

Question

我有一個高度嵌套的mongoDB對象集，我想計算匹配給定條件Edit的子文檔數:(在每個文檔中） 。 例如：

{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
    {
        "study_id":"Study1",
        "samples":[
            {
                "sample_id":"NA00001",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"NA00002",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE1",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE2",
                "formatdata":[
                    {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
    {
        "study_id":"Study3",
        "samples":[
            {
                "sample_id":"SAMPLE3",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            },
            {
                "sample_id":"SAMPLE7",
                "formatdata":[
                    {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
                ]
            }
        ]
    }
]
}

我想知道有多少子文檔包含GT：“1 | 0”，在這種情況下，在第一個文檔中為1，在第二個文檔中為2，在第3個文檔中為0。 我已經嘗試了展開和聚合函數，但我顯然沒有做正確的事情。 當我嘗試通過“GT”字段計算子文檔時，mongo抱怨：

db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])

因為我的小組的名字不能包含“。”，但如果我把它們留下來：

db.collection.aggregate([{$group: {"$GT":1,_id:0}}])

它抱怨因為“$ GT不能是運營商名稱”

有任何想法嗎？

Answer 1

使用數組時需要處理$unwind ，並且需要執行三次：

 db.collection.aggregate([

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": "$studies.samples.formdata.GT",
         "count": { "$sum": 1 }
     }}
 ])

理想情況下，您希望過濾輸入。 可能在處理$ unwind之前和之后使用$ match執行此操作，並使用$ regex匹配以點“1”開頭的數據。

 db.collection.aggregate([

     // Match first to exclude documents where this is not present in any array member
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Un-wind the array's to access filtering 
     { "$unwind": "$studies" },
     { "$unwind": "$studies.samples" },
     { "$unwind": "$studies.samples.formdata" },

     // Match to filter
     { "$match": { "studies.samples.formdata.GT": /^1/ } },

     // Group results to obtain the matched count per key
     { "$group": {
         "_id": {
              "_id": "$_id",
              "key": "$studies.samples.formdata.GT"
         },
         "count": { "$sum": 1 }
     }}
 ])

請注意，在所有情況下，“$ dollar”前綴條目是指向文檔屬性的“變量”。 這些是在右側使用輸入的“值”。 必須將左側“鍵”指定為普通字符串鍵。 沒有變量可用於命名鍵。

MongoDB嵌套對象聚合計數

問題描述

1 個解決方案

解決方案1
16 已采納 2015-01-13 04:46:03

MongoDB嵌套對象聚合計數

問題描述

1 個解決方案

解決方案1 16 已采納 2015-01-13 04:46:03

解決方案1
16 已采納 2015-01-13 04:46:03