简体   繁体   English

PyMongo组通过多个键

[英]PyMongo group by multiple keys

With PyMongo, group by one key seems to be ok: 使用PyMongo,按一键组合似乎可以:

results = collection.group(key={"scan_status":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)

results: 结果:

{u'count': 215339.0, u'scan_status': u'PENDING'} {u'count': 617263.0, u'scan_status': u'DONE'}

but when I try to do group by multiple keys I get an exception: 但是当我尝试按多个键分组时,出现异常:

results = collection.group(key={"scan_status":0,"date":0}, condition={'date': {'$gte': startdate}}, initial={"count": 0}, reduce=reducer)

How can I do group by multiple fields correctly? 如何正确按多个字段分组?

If you are trying to count over two keys then while it is possible using .group() your better option is via .aggregate() . 如果你正在尝试次数结束两个键则虽然可以使用.group()你更好的选择是通过.aggregate()

This uses "native code operators" and not the JavaScript interpreted code as required by .group() to do the same basic "grouping" action as you are trying to achieve. 它使用“本机代码运算符”而不是.group()所需的JavaScript解释代码来执行与您尝试实现的相同的基本“分组”操作。

Particularly here is the $group pipeline operator: 这里特别是$group管道运算符:

result = collection.aggregate([
    # Matchn the documents possible
    { "$match": { "date": { "$gte": startdate } } },

    # Group the documents and "count" via $sum on the values
    { "$group": {
        "_id": {
            "scan_status": "$scan_status",
            "date": "$date"
        },
        "count": { "$sum": 1 }
    }}
])

In fact you probably want something that reduces the "date" into a distinct period. 实际上,您可能想要一些将“日期”缩短到不同时期的方法。 As in: 如:

result = collection.aggregate([
    # Matchn the documents possible
    { "$match": { "date": { "$gte": startdate } } },

    # Group the documents and "count" via $sum on the values
    { "$group": {
        "_id": {
            "scan_status": "$scan_status",
            "date": {
                "year": { "$year": "$date" },
                "month": { "$month" "$date" },
                "day": { "$dayOfMonth": "$date" }
            }
        },
        "count": { "$sum": 1 }
    }}
])

Using the Date Aggregation Operators as shown here. 使用日期汇总运算符 ,如下所示。

Or perhaps with basic "date math": 或使用基本的“日期数学”:

import datetime
from datetime import date

result = collection.aggregate([
    # Matchn the documents possible
    { "$match": { "date": { "$gte": startdate } } },

    # Group the documents and "count" via $sum on the values
    # use "epoch" "1970-01-01" as a base to convert to integer
    { "$group": {
        "_id": {
            "scan_status": "$scan_status",
            "date": {
                "$subtract": [
                    { "$subtract": [ "$date", date.fromtimestamp(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$date", date.fromtimestamp(0) ] },
                        1000 * 60 * 60 * 24
                    ]}
                ]
            }
        },
        "count": { "$sum": 1 }
    }}
])

Which will return integer values from "epoch" time instead of a compisite value object. 它将从“时代”开始返回整数值,而不是互补值对象。

But all of these options are better than .group() as they use native coded routines and perform their actions much faster than the JavaScript code you need to supply otherwise. 但是,所有这些选项都比.group()更好,因为它们使用本机编码例程,并且比您需要提供的JavaScript代码执行动作的速度要快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM