簡體   English   中英

按順序對數組文檔進行分組:MongoDB groupby或mapreduce?

[英]Group array doc by sequence: MongoDB groupby or mapreduce?

在mongodb中,我有一個文檔集合,其中包含一組記錄,我希望通過類似的標簽對記錄進行分組,以保留自然順序

    {
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": ISODate("2019-01-07T09:06:56Z"),
                    "score": 1
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:06Z"),
                    "score": 0
                },
                {
                    "tag": "ou",
                    "unixTime": ISODate("2019-01-07T09:07:06Z"),
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:20Z"),
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:37Z"),
                    "score": 1
                }
         ]

我想通過相似的標簽序列對記錄進行分組(並匯總),而不僅僅是通過對唯一標簽進行分組

所需的輸出:

    {
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": [ISODate("2019-01-07T09:06:56Z")],
                    "score": 1
                    "nbRecords": 1
                },
                {
                    "tag": "u",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0,
                    "nbRecords":1
                },
                {
                    "tag": "ou",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime: [ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")]
                    "score": 1
                    "nbRecords":2
                }
         ]

通過...分組

看來mongodb中的'$ groupby'聚合運算符以前曾按唯一字段對數組和組進行排序

   db.coll.aggregate(
         [
           {"$unwind":"$records"},
           {"$group":
                   {
                       "_id":{ 
                           "tag":"$records.tag",
                           "day":"$day"
                        },
                       ...
                    }
            }
         ]
   )

退貨

{
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": [ISODate("2019-01-07T09:06:56Z")],
                    "score": 1
                    "nbRecords": 1
                },
                {
                    "tag": "u",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z"),ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")],
                    "score": 2,
                    "nbRecords":3
                },
                {
                    "tag": "ou",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0
                },

         ]

映射/縮小

因為我當前正在使用pymongo驅動程序,所以我使用itertools.groupby在python中實現了該解決方案,因為生成器遵循自然順序進行分組,但是我面臨着服務器超時問題(cursor.NotFound Error)時間處理。

關於如何直接使用mongo的mapreduce函數執行與python中的itertools.groupby()等效的任何想法?

非常感謝您的幫助:我正在使用pymongo驅動程序3.8和MongoDB 4.0

你! 在記錄數組中運行,添加一個新的整數索引,每當groupby目標更改時,該索引就會遞增,然后對該索引使用mongo操作。 。〜´

在@Ale的推薦下,在MongoDb中沒有做任何提示。 我切換回解決cursor.NotFound問題的python實現。

我想我可以在Mongodb內完成工作,但這正在解決

for r in db.coll.find():
        session = [

        ]
        for tag, time_score in itertools.groupby(r["records"], key=lambda x:x["tag"]):
            time_score = list(time_score)
            session.append({
                "tag": tag, 
                "start": time_score[0]["unixTime"], 
                "end": time_score[-1]["unixTime"], 
                "ca": sum([n["score"] for n in time_score]), 
                "nb_records": len(time_score) 
            })
        db.col.update(
                {"_id":r["_id"]}, 
                {
                    "$unset": {"records": ""},
                    "$set":{"sessions": session}
                })

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM