简体   繁体   English

Mongodb:汇总子文档数组上的值

[英]Mongodb: aggregate values on array of subdocuments

I have collection of documents: 我有文件收集:

{
    "_id": ObjectId("55dc62647cda24224372e308"),
    "last_modified": ISODate("2015-07-01T15:57:26.874Z"),
    "services": [
        {"last_modified": ISODate("2015-05-08T07:10:11.250Z")},
        {...}
    ]
}

And I need to refresh last_modified field of document by finding max last_updated value of its services: 我需要通过查找其服务的最大last_updated值来刷新文档的last_modified字段:

>db.documents.find().map(function(d){
    db.documents.update(
        {_id: d._id},
        {$set: {last_updated: Math.max(d.services.last_updated)}}
    )
})
Tue Aug 25 16:01:20.536 TypeError: Cannot read property 'last_modified' of undefined

How can I access and aggregate property of subdocument in array? 如何访问和聚合数组中子文档的属性?

The basic process here is that you need to get the maximum sorted date from the array and obtain the value from there. 这里的基本过程是,您需要从数组中获取最大排序日期,并从中获取值。 Of course you need a loop, and you cannot access a value of a document directly in an update statement. 当然,您需要一个循环,并且不能直接在更新语句中访问文档的值。 So you need to read it first, but Bulk operations help here: 因此,您需要先阅读它,但是批量操作在这里有帮助:

var bulk = db.documents.initializeOrderedBulkOp(),
    count = 0;

db.documents.find().forEach(function(doc) {
  var last_modified = doc.services.sort(function(a,b) {
    return a.last_modified < b.last_modified;
  }).slice(-1)[0].last_modified;

  bulk.find({ "_id": doc._id }).updateOne({
    "$set": { "last_modified": last_modified }
  });
  count++;

  if ( count % 1000 == 0 ) {
    bulk.execute();
    bulk = db.documents.initializeOrderedBulkOp();
  }

});

if ( count % 1000 != 0 )
  bulk.execute();

Better yet, consider sorting the array itself on addition of new items. 更好的是,考虑在添加新项目时对数组本身进行排序。 This is basically done with the the $sort modifier to $push 这基本上是通过$push$sort修饰符完成的

 db.documents.update(
     { "_id": id },
     { "$push": { 
         "services": {
             "$each": [{ "last_modified": date }],
             "$sort": { "last_modified": 1 }
     }}
)

Or even forget the $sort since all array values are appended to the end anyway, unless you tell the operation to to otherwise. 甚至会忘记$sort因为无论如何所有数组值都附加到末尾,除非您另行告知操作。

Then you can basically shorten the procedure using $slice . 然后,您基本上可以使用$slice缩短此过程。

var bulk = db.documents.initializeOrderedBulkOp(),
    count = 0;

db.documents.find(
    {},
    { 
        "last_modified": { "$slice": -1}
    }
).forEach(function(doc) {

  bulk.find({ "_id": doc._id }).updateOne({
    "$set": { "last_modified": doc.last_modified[0] }
  });
  count++;

  if ( count % 1000 == 0 ) {
    bulk.execute();
    bulk = db.documents.initializeOrderedBulkOp();
  }

});

if ( count % 1000 != 0 )
  bulk.execute();

The aggregation framework could be used here, but is really not necessary considering how simple it is to just get the maximum date value from the object per document anyway. 可以在此处使用聚合框架,但实际上考虑到仅从每个文档的对象中获取最大日期值是多么简单,实际上并没有必要。

var bulk = db.documents.initializeOrderedBulkOp(),
    count = 0;

db.documents.aggregate([
    { "$unwind": "$services" },
    { "$group": {
        "_id": "$_id",
        "last_modified": { "$max": "$services.last_modified" }
    }}
]).forEach(function(doc) {

  bulk.find({ "_id": doc._id }).updateOne({
    "$set": { "last_modified": doc.last_modified }
  });
  count++;

  if ( count % 1000 == 0 ) {
    bulk.execute();
    bulk = db.documents.initializeOrderedBulkOp();
  }

});

if ( count % 1000 != 0 )
  bulk.execute();

And because of the usage of $unwind this actually comes at a much greater cost than is necessary. 而且由于使用$unwind因此实际上付出的成本比必要的多得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM