简体   繁体   中英

Mongodb: aggregate values on array of subdocuments

I have collection of documents:

{
    "_id": ObjectId("55dc62647cda24224372e308"),
    "last_modified": ISODate("2015-07-01T15:57:26.874Z"),
    "services": [
        {"last_modified": ISODate("2015-05-08T07:10:11.250Z")},
        {...}
    ]
}

And I need to refresh last_modified field of document by finding max last_updated value of its services:

>db.documents.find().map(function(d){
    db.documents.update(
        {_id: d._id},
        {$set: {last_updated: Math.max(d.services.last_updated)}}
    )
})
Tue Aug 25 16:01:20.536 TypeError: Cannot read property 'last_modified' of undefined

How can I access and aggregate property of subdocument in array?

The basic process here is that you need to get the maximum sorted date from the array and obtain the value from there. Of course you need a loop, and you cannot access a value of a document directly in an update statement. So you need to read it first, but Bulk operations help here:

var bulk = db.documents.initializeOrderedBulkOp(),
    count = 0;

db.documents.find().forEach(function(doc) {
  var last_modified = doc.services.sort(function(a,b) {
    return a.last_modified < b.last_modified;
  }).slice(-1)[0].last_modified;

  bulk.find({ "_id": doc._id }).updateOne({
    "$set": { "last_modified": last_modified }
  });
  count++;

  if ( count % 1000 == 0 ) {
    bulk.execute();
    bulk = db.documents.initializeOrderedBulkOp();
  }

});

if ( count % 1000 != 0 )
  bulk.execute();

Better yet, consider sorting the array itself on addition of new items. This is basically done with the the $sort modifier to $push

 db.documents.update(
     { "_id": id },
     { "$push": { 
         "services": {
             "$each": [{ "last_modified": date }],
             "$sort": { "last_modified": 1 }
     }}
)

Or even forget the $sort since all array values are appended to the end anyway, unless you tell the operation to to otherwise.

Then you can basically shorten the procedure using $slice .

var bulk = db.documents.initializeOrderedBulkOp(),
    count = 0;

db.documents.find(
    {},
    { 
        "last_modified": { "$slice": -1}
    }
).forEach(function(doc) {

  bulk.find({ "_id": doc._id }).updateOne({
    "$set": { "last_modified": doc.last_modified[0] }
  });
  count++;

  if ( count % 1000 == 0 ) {
    bulk.execute();
    bulk = db.documents.initializeOrderedBulkOp();
  }

});

if ( count % 1000 != 0 )
  bulk.execute();

The aggregation framework could be used here, but is really not necessary considering how simple it is to just get the maximum date value from the object per document anyway.

var bulk = db.documents.initializeOrderedBulkOp(),
    count = 0;

db.documents.aggregate([
    { "$unwind": "$services" },
    { "$group": {
        "_id": "$_id",
        "last_modified": { "$max": "$services.last_modified" }
    }}
]).forEach(function(doc) {

  bulk.find({ "_id": doc._id }).updateOne({
    "$set": { "last_modified": doc.last_modified }
  });
  count++;

  if ( count % 1000 == 0 ) {
    bulk.execute();
    bulk = db.documents.initializeOrderedBulkOp();
  }

});

if ( count % 1000 != 0 )
  bulk.execute();

And because of the usage of $unwind this actually comes at a much greater cost than is necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM