[英]mongodb: query first few rows where sum of specific column is greater or equal than C
Suppose we have a mongodb collection with 2 columns: id, c 假设我们有一个包含2列的mongodb集合:id,c
1,2
2,6
3,1
...
Now I would like to select first few rows where sum of column c is greater or equal than C
现在我想选择列c的总和大于或等于
C
前几行
In the above case, if C=1, return first 1 row. 在上述情况下,如果C = 1,则返回前1行。 if C=8, return first 2 rows.
如果C = 8,则返回前2行。 if C=9, return first 3 rows.
如果C = 9,则返回前3行。
Query 询问
It could be done using aggregation framework . 可以使用聚合框架来完成。 Consider the next aggregation pipeline
考虑下一个聚合管道
db.collectionName.aggregate([
{
$group:
{
"_id": null,
"ds": { $push: "$$ROOT" },
"cs": { $push: "$c" }
}
}, /* (1) */
{ $unwind: "$ds" }, /* (2) */
{
$project:
{
"_id": "$ds._id",
"c": "$ds.c",
"cs": { $slice: [ "$cs", "$ds._id" ] }
}
}, /* (3): */
{ $unwind: "$cs" }, /* (4) */
{
$group:
{
"_id": "$_id",
"c": { $first: "$c" },
"csum": { $sum: "$cs" }
}
}, /* (5) */
{
$group:
{
"_id": null,
"ds": { $push: "$$ROOT" },
"gteC":
{
$push:
{
$cond:
{
if: { "$gte": [ "$csum", SET_DESIRED_VALUE_FOR_C_HERE ] },
then: "$$ROOT",
else: { }
}
}
}
}
}, /* (6) */
{
$project:
{
"_id": 0,
"docs":
{
$filter:
{
input: "$ds",
"as": "doc",
cond: { $lte: [ "$$doc.csum", { $min: "$gteC.csum" } ] }
}
}
}
}, /* (7) */
{ $unwind: "$docs" }, /* (8) */
{ $project: { "_id": "$docs._id", "c": "$docs.c" } } /* (9) */
]);
Results 结果
C = 1 =>
{ "_id": 1, "c": 2 }
C = 1 =>
{ "_id": 1, "c": 2 }
C = 8 =>
[ { "_id": 2, "c": 6 }, { "_id": 1, "c": 2 } ]
C = 8 =>
[ { "_id": 2, "c": 6 }, { "_id": 1, "c": 2 } ]
C = 9 =>
[ { "_id": 3, "c": 1 }, { "_id": 2, "c": 6 }, { "_id": 1, "c": 2 } ]
C = 9 =>
[ { "_id": 3, "c": 1 }, { "_id": 2, "c": 6 }, { "_id": 1, "c": 2 } ]
C = 10 =>
C = 10 =>
Explanation 说明
The basic idea behind it is to construct helper array for each document in the collection ( stages 1-3 ) 它背后的基本思想是为集合中的每个文档构建辅助数组 ( 阶段1-3 )
{ "_id" : 1, "c" : 2 } -> cs = [ 2 ]
{ "_id" : 2, "c" : 6 } -> cs = [ 2, 6 ]
{ "_id" : 3, "c" : 1 } -> cs = [ 2, 6, 1 ]
using $slice
array aggregation operator and then replace it with sum of all elements it contains ( stages 4-5 ) 使用
$slice
数组聚合运算符 ,然后用它包含的所有元素的总和替换它( 阶段4-5 )
{ "_id" : 1, "c" : 2 } -> csum = 2
{ "_id" : 2, "c" : 6 } -> csum = 8
{ "_id" : 3, "c" : 1 } -> csum = 9
using $unwind
stage and $sum
group accumulator operator . 使用
$unwind
stage和$sum
group accumulator operator 。
Then construct another helper array of documents with csum >= C
( stage 6 ) 然后使用
csum >= C
构建另一个文档辅助数组( 阶段6 )
/* Ex. (C = 8) */
gteC = [ { "_id" : 3, "c" : 1, "csum" : 9 }, { "_id" : 2, "c" : 6, "csum" : 8 } ]
The last step is to retrieve all documents with csum <= Min { gteC.csum }
. 最后一步是使用
csum <= Min { gteC.csum }
检索所有文档。 This is done using $filter
array aggregation operator ( stage 7 ). 这是使用
$filter
数组聚合运算符 ( 第7阶段 )完成的。
However, I am not sure this is the most efficient aggregation pipeline (will be grateful for any improvement suggestions) to achieve what you want. 不过,我不知道这是最有效的聚集管道(将是任何改进的建议表示感谢),以达到你想要什么。
PS Before testing the query don't forget to change the name of collection and replace SET_DESIRED_VALUE_FOR_C_HERE. PS在测试查询之前,不要忘记更改集合的名称并替换SET_DESIRED_VALUE_FOR_C_HERE。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.