从mongodb中的子文档数组中提取（除一个以外的所有）文档

Question

我有一段时间导入的一组数据。 在每次导入时，我都会将一个“ history”子文档附加到一个history数组中。 总体结构类似于以下内容，但具有更多字段：

{ _id: ObjectId('000000000000000001'),
  history: [ {date: ISODate("2014-05-25T22:00:00Z"), value: 1},
             {date: ISODate("2014-05-26T22:00:00Z"), value: 1},
             {date: ISODate("2014-05-26T22:00:00Z"), value: 1} 
  ]
}

问题是，在某些情况下，导入不正确，我最终在同一日期获得了重复的历史记录。 我想删除所有重复项。 我尝试使用$pull更新运算符执行此操作，并将重复调用它，直到每个日期的历史记录条目数量正确为止。 问题是，我有超过一百万个数据点，每个数据点都有不同数量的重复项-有些重复项多达12个。 有什么方法可以在不使用mapReduce的情况下拉除一个以外的所有东西？ 我在想类似的东西：

db.test.update({'history.date': new Date(2014,4,26)},
               {
                $pullAll : 
                   {'history': {date: new Date(2014,4,27)}},
                $push : {'history' : {}}
               }, {multi:true})

Answer 1

试试这个，这很好用：

db.collection.find().forEach(function(doc) {
     db.collection.update(
         { "_id": doc._id },
         { "$set": { "history": [doc.history] } }
     );
})

Answer 2

您提出的问题是，由于两个操作均作用于“历史”数组，因此您实际上在语句中最终出现了冲突的路径。 因此，这些操作实际上并不像您认为的那样“顺序”执行，这会导致冲突，在尝试解析查询时应产生错误。

同样，您实质上是在“擦除”数组的内容，并且如果您的表示法只是一种简写形式，而不是打算仅“按下”并清空对象{} ，那么实际上没有当前的方法可以基于该文档中找到的现有值。

因此，最终方法是循环执行此操作，这的确不错：

 db.collection.find().forEach(function(doc) {
     db.collection.update(
         { "_id": doc._id },
         { "$set": { "history": [] } }
     );
     db.collection.update(
         { "_id": doc._id },
         { "$addToSet": { "history": { "$each": doc.history } } }
     );
 })

当然，如果您拥有MongoDB 2.6或更高版本，则可以在Bulk操作中执行此操作，从而使事情变得非常高效：

 var count = 0;
 var bulk = db.collection.initializeOrderedBulkOp();

 db.collection.find().forEach(function(doc) {

    bulk.find({ "_id": doc._id }).update({
        "$set": { "history": [] }
    });
    bulk.find({ "_id": doc._id }).update({
        "$addToSet": { "history": { "$each": doc.history } }
    });
    count++;

    if ( count % 500 == 0 ) {
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
        count = 0;
    }

 });

 if ( count > 0 )
     bulk.execute();

这样就可以配对操作并发送500或1000个操作的集合，这些操作应该安全地处于BSON 16MB的限制内，当然您可以根据需要进行调整。 尽管实际上每个更新都是按顺序执行的，但在此示例中，每500个项目向服务器的实际发送/响应仅发生一次。

您也可以考虑使用聚合方法查找包含重复项的文档，以通过不更新不需要更新的文档来提高效率：

db.collection.aggregate([
    { "$project": {
       "_id": "$$ROOT",
       "history": 1
    }},
    { "$unwind": "$history" },
    { "$group": {
        "_id": { "date": "$history.date", "value": "$history.value" },
        "orig": { "$first": "_id" }
    }},
    { "$group": {
        "_id": "$orig._id",
        "history": { "$first": "$orig.history" }
    }}
]).forEach(function(doc) {
    // same as above

甚至可以将其用作删除重复项的跳板，因此您只需使用$set通过删除已存在的重复项就可以在每个循环中发送一个更新

 var count = 0;
 var bulk = db.collection.initializeOrderedBulkOp();

db.collection.aggregate([
    { "$unwind": "$history" },
    { "$group": {
        "_id": { "date": "$history.date", "value": "$history.value" },
        "orig": { "$first": "_id" }
    }},
    { "$group": {
        "_id": "$orig._id",
        "history": { "$push": "$_id" }
    }}
]).forEach(function(doc) {

    bulk.find({ "_id": doc._id }).update({
        "$set": { "history": doc.history }
    });
    count++;

    if ( count % 500 == 0 ) {
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
        count = 0;
    }
]);

 if ( count > 0 )
     bulk.execute();

因此，有几种方法可以消除那些可以考虑并适应您的需求的重复条目。

Answer 3

当我想到可以在mongo shell中的三个步骤中完成此操作时，我正要实现上述脚本之一：

date = new Date(2014,4,26);
temp = 'SOMESPECIALTEMPVALUE'

db.test.update({'history.date': date},
           {$set: {
               'history.$.date' : temp
           }}, {multi:true})

db.test.update({'history.date': temp},
           {$pull: {
               'history.date' : temp
           }}, {multi:true})   

db.test.update({'history.date': temp},
           {$set: {
               'history.$.date' : date
           }}, {multi:true})

这是有效的，因为$仅更新第一个匹配的子文档。 然后使用pull我删除所有剩余的重复项。 最后，我将剩余的温度值重置为其原始值。 这对我来说效果很好，因为它是一次只有三个主观日期的操作。 否则，我可能会采用脚本方法。

从mongodb中的子文档数组中提取（除一个以外的所有）文档

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-12-20 10:44:22

解决方案2
1 2014-06-05 01:11:41

解决方案3
0 2014-06-05 21:26:38

从mongodb中的子文档数组中提取（除一个以外的所有）文档

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-12-20 10:44:22

解决方案2 1 2014-06-05 01:11:41

解决方案3 0 2014-06-05 21:26:38

解决方案1
2 已采纳 2017-12-20 10:44:22

解决方案2
1 2014-06-05 01:11:41

解决方案3
0 2014-06-05 21:26:38