简体   繁体   中英

How to sum values in a nested date range in MongoDB

I need to sum the values for 2018-06-01 through 2018-06-30 for each document in the collection. Each key in "days" is a different date and value. What should the mongo aggregate command look like? Result should look something like { _id: Product_123 , June_Sum: value} 在此处输入图片说明

That's really not a great structure for the sort of operation you now want to do. The whole point of keeping data in such a format is that you "increment" it as you go.

For example:

 var now = Date.now(),
     today = new Date(now - ( now % ( 1000 * 60 * 60 * 24 ))).toISOString().substr(0,10);

 var product = "Product_123";

 db.counters.updateOne(
   { 
     "month": today.substr(0,7),
     "product": product
   },
   { 
     "$inc": { 
       [`dates.${today}`]: 1,
       "totals": 1
     }
   },
   { "upsert": true }
 )

In that way the subsequent updates with $inc apply to both the "key" used for the "date" and also increment the "totals" property of the matched document. So after a few iterations you would end up with something like:

{
        "_id" : ObjectId("5af395c53945a933add62173"),
        "product": "Product_123",
        "month": "2018-05",
        "dates" : {
                "2018-05-10" : 2,
                "2018-05-09" : 1
        },
        "totals" : 3
}

If you're not actually doing that then you "should" be since it's the intended usage pattern for such a structure.

Without keeping a "totals" or like type of entry within the document(s) storing these keys the only methods left for "aggregation" in processing are to effectively coerce the the "keys" into an "array" form.

MongoDB 3.6 with $objectToArray

db.colllection.aggregate([
  // Only consider documents with entries within the range
  { "$match": {
    "$expr": {
      "$anyElementTrue": {
        "$map": {
          "input": { "$objectToArray": "$days" },
          "in": {
            "$and": [
              { "$gte": [ "$$this.k", "2018-06-01" ] },
              { "$lt": [ "$$this.k", "2018-07-01" ] }
            ]
          }
        }
      }
    }
  }},
  // Aggregate for the month 
  { "$group": {
    "_id": "$product",           // <-- or whatever your key for the value is
    "total": {
      "$sum": {
        "$sum": {
          "$map": {
            "input": { "$objectToArray": "$days" },
            "in": {
              "$cond": {
                "if": {
                  "$and": [
                    { "$gte": [ "$$this.k", "2018-06-01" ] },
                    { "$lt": [ "$$this.k", "2018-07-01" ] }
                  ]
                },
                "then": "$$this.v",
                "else": 0
              }
            }
          }
        }
      }
    }
  }}
])   

Other versions with mapReduce

db.collection.mapReduce(
  // Taking the same presumption on your un-named key for "product"
  function() {
    Object.keys(this.days)
      .filter( k => k >= "2018-06-01" && k < "2018-07-01")
      .forEach(k => emit(this.product, this.days[k]));
  },
  function(key,values) {
    return Array.sum(values);
  },
  {
    "out": { "inline": 1 },
    "query": {
      "$where": function() {
        return Object.keys(this.days).some(k => k >= "2018-06-01" && k < "2018-07-01")
      }
    }
  }
)

Both are pretty horrible since you need to calculate whether the "keys" fall within the required range even to select the documents and even then still filter through the keys in those documents again in order to decide whether to accumulate for it or not.

Also noting here that if your "Product_123' is also the "name of a key" in the document and NOT a "value", then you're performing even more "gymnastics" to simply convert that "key" into a "value" form, which is how databases do things and the whole point of the the unnecessary coercion going on here.


Better Option

So as opposed to the handling as originally shown where you "should" be accumulating "as you go" with every write to the document(s) at hand, the better option than needing "processing" in order to coerce into an array format is to simply put the data into an array in the first place:

{
        "_id" : ObjectId("5af395c53945a933add62173"),
        "product": "Product_123",
        "month": "2018-05",
        "dates" : [
          { "day": "2018-05-09", "value": 1 },
          { "day": "2018-05-10", "value": 2 }
        },
        "totals" : 3
}

These are infinitely better for purposes of query and further analysis:

db.counters.aggregate([
  { "$match": {
    // "month": "2018-05"    // <-- or really just that, since it's there
    "dates": {
      "day": {
        "$elemMatch": {
          "$gte": "2018-05-01", "$lt": "2018-06-01"
        }
      }
    }
  }},
  { "$group": {
    "_id": null,
    "total": {
      "$sum": {
        "$sum": {
          "$filter": {
            "input": "$dates",
            "cond": {
              "$and": [
                { "$gte": [ "$$this.day", "2018-05-01" ] },
                { "$lt": [ "$$this.day", "2018-06-01" ] }
              ]
            }
          }
        }
      }
    }
  }}
])

Which is of course really efficient, and kind of deliberately avoiding the "total" field that is already there for demonstration only. But of course you keep the "running accumulation" on writes by doing:

db.counters.updateOne(
   { "product": product, "month": today.substr(0,7)}, "dates.day": today },
   { "$inc": { "dates.$.value": 1, "total": 1 } }
)

Which is really simple. Adding upserts adds a "little" more complexity:

// A "batch" of operations with bulkWrite
db.counter.bulkWrite([
  // Incrementing the matched element
  { "udpdateOne": {
    "filter": {
      "product": product,
      "month": today.substr(0,7)},
      "dates.day": today 
    },
    "update": {
      "$inc": { "dates.$.value": 1, "total": 1 }
    }
  }},
  // Pushing a new "un-matched" element
  { "updateOne": {
    "filter": {
      "product": product,
      "month": today.substr(0,7)},
      "dates.day": { "$ne": today }
    },
    "update": {
      "$push": { "dates": { "day": today, "value": 1 } },
      "$inc": { "total": 1 }
    }
  }},
  // "Upserting" a new document were not matched
  { "updateOne": {
    "filter": {
      "product": product,
      "month": today.substr(0,7)},
    },
    "update": {
      "$setOnInsert": {
        "dates": [{ "day": today, "value": 1 }],
        "total": 1
      }
    },
    "upsert": true
  }}
])

But generally your getting the "best of both worlds" by having something simple to accumulate "as you go" as well as something that's easy and efficient to query and do other analysis on later.

The overall moral of the story is to "choose the right structure" for what you actually want to do. Don't put things into "keys" which are clearly intended to be used as "values", since it's an anti-pattern which just adds complexity and inefficiency to the rest of your purposes, even if it seemed right for a "single" purpose when you originally stored it that way.

NOTE Also not really advocating storing "strings" for "dates" in any way here. As noted the better approach is to use "values" where you really mean "values" you intend to use. When storing date data as a "value" it is always far more efficient and practical to store as a BSON Date, and NOT a "string".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM