[英]MongoDB count distinct items in an array
My actors
collection contains an array-of-documents field, called acted_in
. 我的
actors
集合包含一个称为acted_in
的文档数组字段。 Instead of returning the size of acted_in.idmovies
like so: {$size: $acted_in.idmovies}
, I want to return the number of distinct values inside $acted_in.idmovies
. 而不是像这样返回
acted_in.idmovies
的大小: {$size: $acted_in.idmovies}
,我想返回$acted_in.idmovies
中不同值的数量。 How can I do that ? 我怎样才能做到这一点 ?
c1 = actors.aggregate([{"$match": {'$and': [{'fname': f_name},
{'lname': l_name}]}},
{"$project": {'first_name': '$fname',
'last_name': '$lname',
'gender': '$gender',
'distinct_movies_played_in': {'$size': '$acted_in.idmovies'}}}])
You basically need to include $setDifference
in there to obtain the "distinct" items. 基本上,您需要在其中包含
$setDifference
以获得“独特”项。 All "sets" are "distinct" by design and by obtaining the "difference" from the present array to an empty one []
you get the desired result. 通过设计,所有“集合”都是“不同的”,并且通过获得从当前数组到空数组
[]
的“差”,可以得到所需的结果。 Then you can apply the $size
. 然后,您可以应用
$size
。
You also have some common mistakes/misconceptions. 您也有一些常见的错误/误解。 Firstly when using
$match
or any MongoDB query expression you do not need to use $and
unless there is an explicit case to do so. 首先,当使用
$match
或任何MongoDB查询表达式时,不需要使用$and
除非有明确的情况下使用。 All query expression arguments are "already" AND conditions unless explicitly stated otherwise, as with $or
. 除非另有明确说明,否则所有查询表达式参数都是“已经” AND条件,例如
$or
。 So don't explicitly use for this case. 因此,请勿在这种情况下明确使用。
Secondly your $project
was using the explicit field path variables for every field. 其次,您的
$project
使用每个字段的显式字段路径变量。 You do not need to do that just to return the field, and outside of usage in an "expression", you can simply use a 1
to notate you want it included: 您不需要这样做就只需要返回该字段,并且在“表达式”中不使用该代码时,只需使用
1
来表示您希望将其包括在内:
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$project": {
"first_name": 1,
"last_name": 1,
"gender": 1,
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
In fact, if you are actually using MongoDB 3.4 or greater ( and your notation of an element within an array "$acted_in.idmovies"
says you have at least MongoDB 3.2 ) which has support for $addFields
then use that instead of specifying all other fields in the document. 实际上,如果您实际上使用的是MongoDB 3.4或更高版本(并且您对数组
"$acted_in.idmovies"
中的元素的表示表示您至少具有MongoDB 3.2)支持$addFields
则可以使用它代替指定其他所有$addFields
文档中的字段。
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$addFields": {
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
Unless you explicitly need to just specify "some" other fields. 除非您明确需要只指定“一些”其他字段。
The basic case here is do not use $unwind
for array operations unless you specifically need to perform a $group
operation on with it's _id
key pointing at a value obtained from "within" the array. 这里的基本情况是不要对数组操作使用
$unwind
,除非您特别需要执行$group
操作,它的_id
键指向从数组“内部”获得的值。
In all other cases, MongoDB has far more efficient operators for working with arrays that what $unwind
does. 在所有其他情况下,MongoDB具有比
$unwind
更高效的运算符来处理数组。
This should give you what you want: 这应该给您您想要的:
actors.aggregate([
{
$match: {fname: f_name, lname: l_name}
},
{
$unwind: '$tags'
},
{
$group: {
_id: '$_id',
first_name: {$first: '$fname'},
last_name: {$last: '$lname'},
gender: {$first: '$gender'},
tags: {$addToSet: '$tags'}
}
},
{
$project: {
first_name: 1,
last_name: 1,
gender: 1,
distinct: {$size: '$tags'}
}
}
])
After the tags
array is deconstructed and then put back into a set of itself, then you just need to get the number of items or length of that set. 解构
tags
数组后,将其放回自己的集合中,然后只需要获取项目数或该集合的长度即可。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.