简体   繁体   English

如何在MongoDB中嵌套日期范围内的值求和

[英]How to sum values in a nested date range in MongoDB

I need to sum the values for 2018-06-01 through 2018-06-30 for each document in the collection. 我需要对集​​合中每个文档的2018-06-01至2018-06-30的值求和。 Each key in "days" is a different date and value. “天”中的每个键都是不同的日期和值。 What should the mongo aggregate command look like? mongo聚合命令应该是什么样的? Result should look something like { _id: Product_123 , June_Sum: value} 结果应类似于{_id:Product_123,June_Sum:value} 在此处输入图片说明

That's really not a great structure for the sort of operation you now want to do. 对于您现在想要执行的那种操作,这确实不是一个很好的结构。 The whole point of keeping data in such a format is that you "increment" it as you go. 以这种格式保存数据的全部目的是随您的需要“递增”数据。

For example: 例如:

 var now = Date.now(),
     today = new Date(now - ( now % ( 1000 * 60 * 60 * 24 ))).toISOString().substr(0,10);

 var product = "Product_123";

 db.counters.updateOne(
   { 
     "month": today.substr(0,7),
     "product": product
   },
   { 
     "$inc": { 
       [`dates.${today}`]: 1,
       "totals": 1
     }
   },
   { "upsert": true }
 )

In that way the subsequent updates with $inc apply to both the "key" used for the "date" and also increment the "totals" property of the matched document. 这样,使用$inc的后续更新既适用于“日期”所使用的“键”,又适用于匹配文档的“总计”属性。 So after a few iterations you would end up with something like: 因此,经过几次迭代,您最终会得到如下结果:

{
        "_id" : ObjectId("5af395c53945a933add62173"),
        "product": "Product_123",
        "month": "2018-05",
        "dates" : {
                "2018-05-10" : 2,
                "2018-05-09" : 1
        },
        "totals" : 3
}

If you're not actually doing that then you "should" be since it's the intended usage pattern for such a structure. 如果您实际上并没有这样做,那么您应该“这样做”,因为它是这种结构的预期使用模式。

Without keeping a "totals" or like type of entry within the document(s) storing these keys the only methods left for "aggregation" in processing are to effectively coerce the the "keys" into an "array" form. 在存储这些关键字的文档中不保留“总计”或类似条目类型的情况下,处理中留给“聚合”的唯一方法是有效地将“关键字”强制转换为“数组”形式。

MongoDB 3.6 with $objectToArray 具有$ objectToArray的MongoDB 3.6

db.colllection.aggregate([
  // Only consider documents with entries within the range
  { "$match": {
    "$expr": {
      "$anyElementTrue": {
        "$map": {
          "input": { "$objectToArray": "$days" },
          "in": {
            "$and": [
              { "$gte": [ "$$this.k", "2018-06-01" ] },
              { "$lt": [ "$$this.k", "2018-07-01" ] }
            ]
          }
        }
      }
    }
  }},
  // Aggregate for the month 
  { "$group": {
    "_id": "$product",           // <-- or whatever your key for the value is
    "total": {
      "$sum": {
        "$sum": {
          "$map": {
            "input": { "$objectToArray": "$days" },
            "in": {
              "$cond": {
                "if": {
                  "$and": [
                    { "$gte": [ "$$this.k", "2018-06-01" ] },
                    { "$lt": [ "$$this.k", "2018-07-01" ] }
                  ]
                },
                "then": "$$this.v",
                "else": 0
              }
            }
          }
        }
      }
    }
  }}
])   

Other versions with mapReduce 其他版本与mapReduce

db.collection.mapReduce(
  // Taking the same presumption on your un-named key for "product"
  function() {
    Object.keys(this.days)
      .filter( k => k >= "2018-06-01" && k < "2018-07-01")
      .forEach(k => emit(this.product, this.days[k]));
  },
  function(key,values) {
    return Array.sum(values);
  },
  {
    "out": { "inline": 1 },
    "query": {
      "$where": function() {
        return Object.keys(this.days).some(k => k >= "2018-06-01" && k < "2018-07-01")
      }
    }
  }
)

Both are pretty horrible since you need to calculate whether the "keys" fall within the required range even to select the documents and even then still filter through the keys in those documents again in order to decide whether to accumulate for it or not. 两者都非常可怕,因为您甚至需要计算“密钥”是否落在要求的范围内,甚至选择文档,甚至还要再次筛选那些文档中的密钥以决定是否进行累积。

Also noting here that if your "Product_123' is also the "name of a key" in the document and NOT a "value", then you're performing even more "gymnastics" to simply convert that "key" into a "value" form, which is how databases do things and the whole point of the the unnecessary coercion going on here. 在此还要注意,如果您的"Product_123'也是文档中的“键名”而不是“值”,那么您将执行更多的“体操”操作,以将“键”简单地转换为“值”形式,这是数据库的工作方式以及不必要的强制措施的全部内容。


Better Option 更好的选择

So as opposed to the handling as originally shown where you "should" be accumulating "as you go" with every write to the document(s) at hand, the better option than needing "processing" in order to coerce into an array format is to simply put the data into an array in the first place: 因此,与最初显示的处理方式相反,在每次对文档进行每次写操作时,您都应该“随身携带”累积的内容,比强制处理为数组格式的“处理”更好的选择是首先简单地将数据放入数组中:

{
        "_id" : ObjectId("5af395c53945a933add62173"),
        "product": "Product_123",
        "month": "2018-05",
        "dates" : [
          { "day": "2018-05-09", "value": 1 },
          { "day": "2018-05-10", "value": 2 }
        },
        "totals" : 3
}

These are infinitely better for purposes of query and further analysis: 这些对于查询和进一步分析是绝对更好的:

db.counters.aggregate([
  { "$match": {
    // "month": "2018-05"    // <-- or really just that, since it's there
    "dates": {
      "day": {
        "$elemMatch": {
          "$gte": "2018-05-01", "$lt": "2018-06-01"
        }
      }
    }
  }},
  { "$group": {
    "_id": null,
    "total": {
      "$sum": {
        "$sum": {
          "$filter": {
            "input": "$dates",
            "cond": {
              "$and": [
                { "$gte": [ "$$this.day", "2018-05-01" ] },
                { "$lt": [ "$$this.day", "2018-06-01" ] }
              ]
            }
          }
        }
      }
    }
  }}
])

Which is of course really efficient, and kind of deliberately avoiding the "total" field that is already there for demonstration only. 这当然是非常有效的,并且有意避免了仅用于演示的"total"字段。 But of course you keep the "running accumulation" on writes by doing: 但是,您当然可以通过执行以下操作来保持“运行累积”:

db.counters.updateOne(
   { "product": product, "month": today.substr(0,7)}, "dates.day": today },
   { "$inc": { "dates.$.value": 1, "total": 1 } }
)

Which is really simple. 这真的很简单。 Adding upserts adds a "little" more complexity: 添加upserts会增加“一点点”的复杂性:

// A "batch" of operations with bulkWrite
db.counter.bulkWrite([
  // Incrementing the matched element
  { "udpdateOne": {
    "filter": {
      "product": product,
      "month": today.substr(0,7)},
      "dates.day": today 
    },
    "update": {
      "$inc": { "dates.$.value": 1, "total": 1 }
    }
  }},
  // Pushing a new "un-matched" element
  { "updateOne": {
    "filter": {
      "product": product,
      "month": today.substr(0,7)},
      "dates.day": { "$ne": today }
    },
    "update": {
      "$push": { "dates": { "day": today, "value": 1 } },
      "$inc": { "total": 1 }
    }
  }},
  // "Upserting" a new document were not matched
  { "updateOne": {
    "filter": {
      "product": product,
      "month": today.substr(0,7)},
    },
    "update": {
      "$setOnInsert": {
        "dates": [{ "day": today, "value": 1 }],
        "total": 1
      }
    },
    "upsert": true
  }}
])

But generally your getting the "best of both worlds" by having something simple to accumulate "as you go" as well as something that's easy and efficient to query and do other analysis on later. 但是通常,您可以通过“随身携带”积累一些简单的东西,并在以后查询和进行其他分析时轻松而高效地获取“两全其美”的东西。

The overall moral of the story is to "choose the right structure" for what you actually want to do. 故事的总体寓意是为您实际想要做的事情“选择正确的结构”。 Don't put things into "keys" which are clearly intended to be used as "values", since it's an anti-pattern which just adds complexity and inefficiency to the rest of your purposes, even if it seemed right for a "single" purpose when you originally stored it that way. 不要将东西放到明显打算用作“值”的“键”中,因为这是一种反模式,只会为您的其余目的增加复杂性和效率,即使对于“单个”而言似乎正确最初以这种方式存储时的目的。

NOTE Also not really advocating storing "strings" for "dates" in any way here. 注意同样也不建议在此处以任何方式为“日期”存储“字符串”。 As noted the better approach is to use "values" where you really mean "values" you intend to use. 如前所述,更好的方法是使用“值”,即您真正要使用的“值”。 When storing date data as a "value" it is always far more efficient and practical to store as a BSON Date, and NOT a "string". 当将日期数据存储为“值”时,将其存储为BSON日期而不是“字符串” 总是更加高效和实用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM