简体   繁体   English

MongoDB计算数组中的不同项目

[英]MongoDB count distinct items in an array

My actors collection contains an array-of-documents field, called acted_in . 我的actors集合包含一个称为acted_in的文档数组字段。 Instead of returning the size of acted_in.idmovies like so: {$size: $acted_in.idmovies} , I want to return the number of distinct values inside $acted_in.idmovies . 而不是像这样返回acted_in.idmovies的大小: {$size: $acted_in.idmovies} ,我想返回$acted_in.idmovies中不同值的数量。 How can I do that ? 我怎样才能做到这一点 ?

c1 = actors.aggregate([{"$match": {'$and': [{'fname': f_name},
                                            {'lname': l_name}]}},
                       {"$project": {'first_name': '$fname',
                                     'last_name': '$lname',
                                     'gender': '$gender',
                                     'distinct_movies_played_in': {'$size': '$acted_in.idmovies'}}}])

You basically need to include $setDifference in there to obtain the "distinct" items. 基本上,您需要在其中包含$setDifference以获得“独特”项。 All "sets" are "distinct" by design and by obtaining the "difference" from the present array to an empty one [] you get the desired result. 通过设计,所有“集合”都是“不同的”,并且通过获得从当前数组到空数组[]的“差”,可以得到所需的结果。 Then you can apply the $size . 然后,您可以应用$size

You also have some common mistakes/misconceptions. 您也有一些常见的错误/误解。 Firstly when using $match or any MongoDB query expression you do not need to use $and unless there is an explicit case to do so. 首先,当使用$match或任何MongoDB查询表达式时,不需要使用$and除非有明确的情况下使用。 All query expression arguments are "already" AND conditions unless explicitly stated otherwise, as with $or . 除非另有明确说明,否则所有查询表达式参数都是“已经” AND条件,例如$or So don't explicitly use for this case. 因此,请勿在这种情况下明确使用。

Secondly your $project was using the explicit field path variables for every field. 其次,您的$project使用每个字段的显式字段路径变量。 You do not need to do that just to return the field, and outside of usage in an "expression", you can simply use a 1 to notate you want it included: 您不需要这样做就只需要返回该字段,并且在“表达式”中不使用该代码时,只需使用1来表示您希望将其包括在内:

c1  = actors.aggregate([
 { "$match": { "fname"': f_name, "lname": l_name } },
 { "$project": {
   "first_name": 1,
   "last_name": 1,
   "gender": 1,
   "distinct_movies_played_in": { 
     "$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
   } 
  }}
])

In fact, if you are actually using MongoDB 3.4 or greater ( and your notation of an element within an array "$acted_in.idmovies" says you have at least MongoDB 3.2 ) which has support for $addFields then use that instead of specifying all other fields in the document. 实际上,如果您实际上使用的是MongoDB 3.4或更高版本(并且您对数组"$acted_in.idmovies"中的元素的表示表示您至少具有MongoDB 3.2)支持$addFields则可以使用它代替指定其他所有$addFields文档中的字段。

c1  = actors.aggregate([
 { "$match": { "fname"': f_name, "lname": l_name } },
 { "$addFields": {
   "distinct_movies_played_in": { 
     "$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
   } 
  }}
])

Unless you explicitly need to just specify "some" other fields. 除非您明确需要只指定“一些”其他字段。

The basic case here is do not use $unwind for array operations unless you specifically need to perform a $group operation on with it's _id key pointing at a value obtained from "within" the array. 这里的基本情况是不要对数组操作使用$unwind ,除非您特别需要执行$group操作,它的_id键指向从数组“内部”获得的值。

In all other cases, MongoDB has far more efficient operators for working with arrays that what $unwind does. 在所有其他情况下,MongoDB具有比$unwind更高效的运算符来处理数组。

This should give you what you want: 这应该给您您想要的:

actors.aggregate([
    {
        $match: {fname: f_name, lname: l_name}
    }, 
    {
        $unwind: '$tags'
    }, 
    {
        $group: {
                    _id: '$_id', 
                    first_name: {$first: '$fname'}, 
                    last_name: {$last: '$lname'}, 
                    gender: {$first: '$gender'}, 
                    tags: {$addToSet: '$tags'}
                }
    }, 
    {
        $project: {
                      first_name: 1, 
                      last_name: 1, 
                      gender: 1, 
                      distinct: {$size: '$tags'}
                  }
    }
])

After the tags array is deconstructed and then put back into a set of itself, then you just need to get the number of items or length of that set. 解构tags数组后,将其放回自己的集合中,然后只需要获取项目数或该集合的长度即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM