简体   繁体   English

子文档中的行的Mongoose聚合“$ sum”

[英]Mongoose aggregation “$sum” of rows in sub document

I'm fairly good with sql queries, but I can't seem to get my head around grouping and getting sum of mongo db documents, 我对sql查询相当不错,但我似乎无法理解分组和获取mongo db文件的总和,

With this in mind, I have a job model with schema like below : 考虑到一点,我有一个如下模式的工作模型:

    {
        name: {
            type: String,
            required: true
        },
        info: String,
        active: {
            type: Boolean,
            default: true
        },
        all_service: [

            price: {
                type: Number,
                min: 0,
                required: true
            },
            all_sub_item: [{
                name: String,
                price:{ // << -- this is the price I want to calculate
                    type: Number,
                    min: 0
                },
                owner: {
                    user_id: {  //  <<-- here is the filter I want to put
                        type: Schema.Types.ObjectId,
                        required: true
                    },
                    name: String,
                    ...
                }
            }]

        ],
        date_create: {
            type: Date,
            default : Date.now
        },
        date_update: {
            type: Date,
            default : Date.now
        }
    }

I would like to have a sum of price column, where owner is present, I tried below but no luck 我想有一个price列的总和, owner在场,我尝试下面,但没有运气

 Job.aggregate(
        [
            {
                $group: {
                    _id: {}, // not sure what to put here
                    amount: { $sum: '$all_service.all_sub_item.price' }
                },
                $match: {'not sure how to limit the user': given_user_id}
            }
        ],
        //{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
        function(err, summary) {
            console.log(err);
            console.log(summary);
        }
    );

Could someone guide me in the right direction. 有人可以引导我朝着正确的方向前进。 thank you in advance 先感谢您

Primer 底漆


As is correctly noted earlier, it does help to think of an aggregation "pipeline" just as the "pipe" | 正如前面正确提到的,它确实有助于将聚合“管道”视为“管道” | operator from Unix and other system shells. 来自Unix和其他系统shell的运算符。 One "stage" feeds input to the "next" stage and so on. 一个“阶段”将输入馈送到“下一个”阶段,依此类推。

The thing you need to be careful with here is that you have "nested" arrays, one array within another, and this can make drastic differences to your expected results if you are not careful. 你需要注意的是你有“嵌套”数组,一个数组在另一个数组中,如果你不小心,这会对你的预期结果产生巨大的差异。

Your documents consist of an "all_service" array at the top level. 您的文档由顶层的“all_service”数组组成。 Presumably there are often "multiple" entries here, all containing your "price" property as well as "all_sub_item". 据推测,这里通常有“多个”条目,都包含您的“价格”属性以及“all_sub_item”。 Then of course "all_sub_item" is an array in itself, also containg many items of it's own. 然后当然“all_sub_item”本身就是一个数组,也包含了它自己的许多项目。

You can think of these arrays as the "relations" between your tables in SQL, in each case a "one-to-many". 您可以将这些数组视为SQL中表之间的“关系”,在每种情况下都是“一对多”。 But the data is in a "pre-joined" form, where you can fetch all data at once without performing joins. 但是数据处于“预加入”形式,您可以在其中一次获取所有数据而不执行连接。 That much you should already be familiar with. 你应该已经熟悉了很多。

However, when you want to "aggregate" accross documents, you need to "de-normalize" this in much the same way as in SQL by "defining" the "joins". 但是,当您想要跨文档“聚合”时,您需要通过“定义”“连接”来与SQL中的“反规范化”方式大致相同。 This is to "transform" the data into a de-normalized state that is suitable for aggregation. 这是为了将数据“转换”为适合于聚合的去标准化状态。

So the same visualization applies. 因此同样的可视化适用。 A master document's entries are replicated by the number of child documents, and a "join" to an "inner-child" will replicate both the master and initial "child" accordingly. 主文档的条目由子文档的数量复制,并且“加入”到“内部子”将相应地复制主文件和初始“子”。 In a "nutshell", this: 在“简言之”,这:

{
    "a": 1,
    "b": [
        { 
            "c": 1,
            "d": [
                { "e": 1 }, { "e": 2 }
            ]
        },
        { 
            "c": 2,
            "d": [
                { "e": 1 }, { "e": 2 }
            ]
        }
    ]
}

Becomes this: 变成这样:

{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }

And the operation to do this is $unwind , and since there are multiple arrays then you need to $unwind both of them before continuing any processing: 执行此操作的操作是$unwind ,并且由于有多个数组,因此您需要在继续任何处理之前将它们$unwind两个:

db.collection.aggregate([
    { "$unwind": "$b" },
    { "$unwind": "$b.d" }
])

So there the "pipe" first array from "$b" like so: 所以“管道”第一个数组来自“$ b”,如下所示:

{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }

Which leaves a second array referenced by "$bd" to further be de-normalized into the the final de-normalized result "without any arrays". 这使得“$ bd”引用的第二个数组进一步被去标准化为最终的非标准化结果“没有任何数组”。 This allows other operations to process. 这允许其他操作进行处理。

Solving 解决


With just about "every" aggregation pipeline, the "first" thing you want to do is "filter" the documents to only those that contain your results. 对于几乎“每个”聚合管道,您要做的“第一”事情是“过滤”文档,只包含那些包含结果的文档。 This is a good idea, as especially when doing operations such as $unwind , then you don't want to be doing that on documents that do not even match your target data. 这是一个好主意,特别是在执行诸如$unwind操作时,您不希望在与您的目标数据甚至不匹配的文档上执行此操作。

So you need to match your "user_id" at the array depth. 所以你需要在数组深度匹配你的“user_id”。 But this is only part of getting the result, since you should be aware of what happens when you query a document for a matching value in an array. 但这只是获取结果的一部分,因为您应该知道在查询文档中的数组中匹配值时会发生什么。

Of course, the "whole" document is still returned, because this is what you really asked for. 当然,仍然会返回“整个”文档,因为这是您真正要求的。 The data is already "joined" and we haven't asked to "un-join" it in any way.You look at this just as a "first" document selection does, but then when "de-normalized", every array element now actualy represents a "document" in itself. 数据已经“加入”了,我们没有要求以任何方式“取消加入”它。你看这个就像“第一个”文档选择那样,但是当“去规范化”时,每个数组元素现在,实际上代表了一个“文件”。

So not "only" do you $match at the beginning of the "pipeline", you also $match after you have processed "all" $unwind statements, down to the level of the element you wish to match. 因此,在“管道”的开头不是“仅” $match ,在处理“所有” $unwind语句后,您还需要$match ,直到您希望匹配的元素级别。

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // De-normalize arrays
        { "$unwind": "$all_service" },
        { "$unwind": "$all_service.all_subitem" },

        // Match again to filter the array elements
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_service.all_sub_item.price" }
        }}

    ],
    function(err,results) {

    }
)

Alternately, modern MongoDB releases since 2.6 also support the $redact operator. 或者,自2.6以来的现代MongoDB版本也支持$redact运算符。 This could be used in this case to "pre-filter" the array content before processing with $unwind : 在这种情况下,这可以用于在使用$unwind处理之前“预过滤”数组内容:

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Filter arrays for matches in document
        { "$redact": {
            "$cond": {
                "if": { 
                    "$eq": [ 
                        { "$ifNull": [ "$owner", given_user_id ] },
                        given_user_id
                    ]
                },
                "then": "$$DESCEND",
                "else": "$$PRUNE"
            }
        }},

        // De-normalize arrays
        { "$unwind": "$all_service" },
        { "$unwind": "$all_service.all_subitem" },

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_service.all_sub_item.price" }
        }}

    ],
    function(err,results) {

    }
)

That can "recursively" traverse the document and test for the condition, effectively removing any "un-matched" array elements before you even $unwind . 这可以“递归地”遍历文档并测试条件,在你$unwind之前有效地删除任何“未匹配”的数组元素。 This can speed things up a bit since items that do not match would not need to be "un-wound". 这可以加快速度,因为不匹配的项目不需要“无伤口”。 However there is a "catch" in that if for some reason the "owner" did not exist on an array element at all, then the logic required here would count that as another "match". 然而,如果由于某种原因,“所有者”根本不存在于数组元素上,则存在“捕获”,那么此处所需的逻辑将其视为另一个“匹配”。 You can always $match again to be sure, but there is still a more efficient way to do this: 您总是可以再次$match以确定,但仍有更有效的方法来执行此操作:

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Filter arrays for matches in document
        { "$project": {
            "all_items": {
              "$setDifference": [
                { "$map": {
                  "input": "$all_service",
                  "as": "A",
                  "in": {
                    "$setDifference": [
                      { "$map": {
                        "input": "$$A.all_sub_item",
                        "as": "B",
                        "in": {
                          "$cond": {
                            "if": { "$eq": [ "$$B.owner", given_user_id ] },
                            "then": "$$B",
                            "else": false
                          }
                        }
                      }},
                      false
                    ]          
                  }
                }},
                [[]]
              ]
            }
        }},


        // De-normalize the "two" level array. "Double" $unwind
        { "$unwind": "$all_items" },
        { "$unwind": "$all_items" },

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_items.price" }
        }}

    ],
    function(err,results) {

    }
)

That process cuts down the size of the items in both arrays "drastically" compared to $redact . $redact相比,该过程“大幅”减少了两个数组中项目的大小。 The $map operator processes each elment of an array to the given statement within "in". $map运算符将数组的每个元素处理到“in”中的给定语句。 In this case, each "outer" array elment is sent to another $map to process the "inner" elements. 在这种情况下,每个“外部”数组元素被发送到另一个$map以处理“内部”元素。

A logical test is performed here with $cond whereby if the "condiition" is met then the "inner" array elment is returned, otherwise the false value is returned. 这里使用$cond执行逻辑测试,如果满足“condiition”,则返回“inner”数组元素,否则返回false值。

The $setDifference is used to filter down any false values that are returned. $setDifference用于过滤掉返回的任何false值。 Or as in the "outer" case, any "blank" arrays resulting from all false values being filtered from the "inner" where there is no match there. 或者在“外部”情况下,由所有false值导致的任何“空白”数组从“内部”过滤而在那里没有匹配。 This leaves just the matching items, encased in a "double" array, eg: 这只留下匹配的项目,包含在“双”数组中,例如:

[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]

As "all" array elements have an _id by default with mongoose (and this is a good reason why you keep that) then every item is "distinct" and not affected by the "set" operator, apart from removing the un-matched values. 因为“所有”数组元素默认使用带有mongoose的_id (这是你保留它的一个很好的理由)然后每个项目都是“不同的”并且不受“set”运算符的影响,除了删除不匹配的值。

Process $unwind "twice" to convert these into plain objects in their own documents, suitable for aggregation. 处理$unwind “两次”将它们转换为自己文档中的普通对象,适合聚合。

So those are the things you need to know. 所以这些是你需要知道的事情。 As I stated earlier, be "aware" of how the data "de-normalizes" and what that implies towards your end totals. 正如我之前所说的那样,要“了解”数据如何“去标准化”以及这意味着对你的最终总数。

It sounds like you want to, in SQL equivalent, do "sum (prices) WHERE owner IS NOT NULL" . 这听起来像你想要的,在SQL等价物中,做"sum (prices) WHERE owner IS NOT NULL"

On that assumption, you'll want to do your $match first, to reduce the input set to your sum. 根据这个假设,您将首先要进行$ match,以减少输入设置为总和。 So your first stage should be something like 所以你的第一阶段应该是这样的

$match: { all_service.all_sub_items.owner : { $exists: true } }

Think of this as then passing all matching documents to your second stage. 想一想,然后将所有匹配的文档传递到第二阶段。

Now, because you are summing an array, you have to do another step. 现在,因为你要对一个数组求和,你必须再做一步。 Aggregation operators work on documents - there isn't really a way to sum an array. 聚合运算符处理文档 - 实际上没有一种方法可以对数组求和。 So we want to expand your array so that each element in the array gets pulled out to represent the array field as a value, in its own document. 因此,我们希望扩展您的数组,以便在其自己的文档中拉出数组中的每个元素以将数组字段表示为值。 Think of this as a cross join. 将此视为交叉连接。 This will be $unwind . 这将是$放松

$unwind: { "$all_service.all_sub_items" }

Now you've just made a much larger number of documents, but in a form where we can sum them. 现在你已经制作了更多的文档,但是我们可以将它们相加。 Now we can perform the $group. 现在我们可以执行$ group了。 In your $group, you specify a transformation. 在$ group中,指定转换。 The line: 这条线:

_id: {}, // not sure what to put here

is creating a field in the output document , which is not the same documents as the input documents. 输出文档中创建一个字段,该字段与输入文档不同。 So you can make the _id here anything you'd like, but think of this as the equivalent to your "GROUP BY" in sql. 所以你可以在这里做任何你想要的东西,但是把它当作sql中的“GROUP BY”的等价物。 The $sum operator will essentially be creating a sum for each group of documents you create here that match that _id - so essentially we'll be "re-collapsing" what you just did with $unwind, by using the $group. $ sum运算符实际上是为你在这里创建的每个文档组创建一个与_id匹配的总和 - 所以基本上我们将使用$ group“重新折叠”你刚刚使用$ unwind做的事情。 But this will allow $sum to work. 但这将允许$ sum工作。

I think you're looking for grouping on just your main document id, so I think your $sum statement in your question is correct. 我认为您正在寻找仅对您的主文档ID进行分组,因此我认为您的$ sum语句在您的问题中是正确的。

$group : { _id : $_id, totalAmount : { $sum : '$all_service.all_sub_item.price' } }

This will output documents with an _id field equivalent to your original document ID, and your sum. 这将输出文件,其中_id字段等同于您的原始文档ID和您的总和。

I'll let you put it together, I'm not super familiar with node. 我会让你把它放在一起,我不是很熟悉节点。 You were close but I think moving your $match to the front and using an $unwind stage will get you where you need to be. 你很接近,但我认为将你的$ match转移到前面并使用$ unwind阶段将获得你需要的位置。 Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM