[英]MongoDB nested object aggregation counting
我有一個高度嵌套的mongoDB對象集,我想計算匹配給定條件Edit的子文檔數:(在每個文檔中) 。 例如:
{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"},
"studies":[
{
"study_id":"Study1",
"samples":[
{
"sample_id":"NA00001",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"NA00002",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE1",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE2",
"formatdata":[
{"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
{"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"},
"studies":[
{
"study_id":"Study3",
"samples":[
{
"sample_id":"SAMPLE3",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
},
{
"sample_id":"SAMPLE7",
"formatdata":[
{"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}
]
}
]
}
]
}
我想知道有多少子文檔包含GT:“1 | 0”,在這種情況下,在第一個文檔中為1,在第二個文檔中為2,在第3個文檔中為0。 我已經嘗試了展開和聚合函數,但我顯然沒有做正確的事情。 當我嘗試通過“GT”字段計算子文檔時,mongo抱怨:
db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}])
因為我的小組的名字不能包含“。”,但如果我把它們留下來:
db.collection.aggregate([{$group: {"$GT":1,_id:0}}])
它抱怨因為“$ GT不能是運營商名稱”
有任何想法嗎?
使用數組時需要處理$unwind
,並且需要執行三次:
db.collection.aggregate([
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Group results to obtain the matched count per key
{ "$group": {
"_id": "$studies.samples.formdata.GT",
"count": { "$sum": 1 }
}}
])
理想情況下,您希望過濾輸入。 可能在處理$ unwind之前和之后使用$ match執行此操作,並使用$ regex匹配以點“1”開頭的數據。
db.collection.aggregate([
// Match first to exclude documents where this is not present in any array member
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Un-wind the array's to access filtering
{ "$unwind": "$studies" },
{ "$unwind": "$studies.samples" },
{ "$unwind": "$studies.samples.formdata" },
// Match to filter
{ "$match": { "studies.samples.formdata.GT": /^1/ } },
// Group results to obtain the matched count per key
{ "$group": {
"_id": {
"_id": "$_id",
"key": "$studies.samples.formdata.GT"
},
"count": { "$sum": 1 }
}}
])
請注意,在所有情況下,“$ dollar”前綴條目是指向文檔屬性的“變量”。 這些是在右側使用輸入的“值”。 必須將左側“鍵”指定為普通字符串鍵。 沒有變量可用於命名鍵。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.