從mongodb中的子文檔數組中提取（除一個以外的所有）文檔

Question

我有一段時間導入的一組數據。 在每次導入時，我都會將一個“ history”子文檔附加到一個history數組中。 總體結構類似於以下內容，但具有更多字段：

{ _id: ObjectId('000000000000000001'),
  history: [ {date: ISODate("2014-05-25T22:00:00Z"), value: 1},
             {date: ISODate("2014-05-26T22:00:00Z"), value: 1},
             {date: ISODate("2014-05-26T22:00:00Z"), value: 1} 
  ]
}

問題是，在某些情況下，導入不正確，我最終在同一日期獲得了重復的歷史記錄。 我想刪除所有重復項。 我嘗試使用$pull更新運算符執行此操作，並將重復調用它，直到每個日期的歷史記錄條目數量正確為止。 問題是，我有超過一百萬個數據點，每個數據點都有不同數量的重復項-有些重復項多達12個。 有什么方法可以在不使用mapReduce的情況下拉除一個以外的所有東西？ 我在想類似的東西：

db.test.update({'history.date': new Date(2014,4,26)},
               {
                $pullAll : 
                   {'history': {date: new Date(2014,4,27)}},
                $push : {'history' : {}}
               }, {multi:true})

Answer 1

試試這個，這很好用：

db.collection.find().forEach(function(doc) {
     db.collection.update(
         { "_id": doc._id },
         { "$set": { "history": [doc.history] } }
     );
})

Answer 2

您提出的問題是，由於兩個操作均作用於“歷史”數組，因此您實際上在語句中最終出現了沖突的路徑。 因此，這些操作實際上並不像您認為的那樣“順序”執行，這會導致沖突，在嘗試解析查詢時應產生錯誤。

同樣，您實質上是在“擦除”數組的內容，並且如果您的表示法只是一種簡寫形式，而不是打算僅“按下”並清空對象{} ，那么實際上沒有當前的方法可以基於該文檔中找到的現有值。

因此，最終方法是循環執行此操作，這的確不錯：

 db.collection.find().forEach(function(doc) {
     db.collection.update(
         { "_id": doc._id },
         { "$set": { "history": [] } }
     );
     db.collection.update(
         { "_id": doc._id },
         { "$addToSet": { "history": { "$each": doc.history } } }
     );
 })

當然，如果您擁有MongoDB 2.6或更高版本，則可以在Bulk操作中執行此操作，從而使事情變得非常高效：

 var count = 0;
 var bulk = db.collection.initializeOrderedBulkOp();

 db.collection.find().forEach(function(doc) {

    bulk.find({ "_id": doc._id }).update({
        "$set": { "history": [] }
    });
    bulk.find({ "_id": doc._id }).update({
        "$addToSet": { "history": { "$each": doc.history } }
    });
    count++;

    if ( count % 500 == 0 ) {
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
        count = 0;
    }

 });

 if ( count > 0 )
     bulk.execute();

這樣就可以配對操作並發送500或1000個操作的集合，這些操作應該安全地處於BSON 16MB的限制內，當然您可以根據需要進行調整。 盡管實際上每個更新都是按順序執行的，但在此示例中，每500個項目向服務器的實際發送/響應僅發生一次。

您也可以考慮使用聚合方法查找包含重復項的文檔，以通過不更新不需要更新的文檔來提高效率：

db.collection.aggregate([
    { "$project": {
       "_id": "$$ROOT",
       "history": 1
    }},
    { "$unwind": "$history" },
    { "$group": {
        "_id": { "date": "$history.date", "value": "$history.value" },
        "orig": { "$first": "_id" }
    }},
    { "$group": {
        "_id": "$orig._id",
        "history": { "$first": "$orig.history" }
    }}
]).forEach(function(doc) {
    // same as above

甚至可以將其用作刪除重復項的跳板，因此您只需使用$set通過刪除已存在的重復項就可以在每個循環中發送一個更新

 var count = 0;
 var bulk = db.collection.initializeOrderedBulkOp();

db.collection.aggregate([
    { "$unwind": "$history" },
    { "$group": {
        "_id": { "date": "$history.date", "value": "$history.value" },
        "orig": { "$first": "_id" }
    }},
    { "$group": {
        "_id": "$orig._id",
        "history": { "$push": "$_id" }
    }}
]).forEach(function(doc) {

    bulk.find({ "_id": doc._id }).update({
        "$set": { "history": doc.history }
    });
    count++;

    if ( count % 500 == 0 ) {
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
        count = 0;
    }
]);

 if ( count > 0 )
     bulk.execute();

因此，有幾種方法可以消除那些可以考慮並適應您的需求的重復條目。

Answer 3

當我想到可以在mongo shell中的三個步驟中完成此操作時，我正要實現上述腳本之一：

date = new Date(2014,4,26);
temp = 'SOMESPECIALTEMPVALUE'

db.test.update({'history.date': date},
           {$set: {
               'history.$.date' : temp
           }}, {multi:true})

db.test.update({'history.date': temp},
           {$pull: {
               'history.date' : temp
           }}, {multi:true})   

db.test.update({'history.date': temp},
           {$set: {
               'history.$.date' : date
           }}, {multi:true})

這是有效的，因為$僅更新第一個匹配的子文檔。 然后使用pull我刪除所有剩余的重復項。 最后，我將剩余的溫度值重置為其原始值。 這對我來說效果很好，因為它是一次只有三個主觀日期的操作。 否則，我可能會采用腳本方法。

從mongodb中的子文檔數組中提取（除一個以外的所有）文檔

問題描述

3 個解決方案

解決方案1
2 已采納 2017-12-20 10:44:22

解決方案2
1 2014-06-05 01:11:41

解決方案3
0 2014-06-05 21:26:38

從mongodb中的子文檔數組中提取（除一個以外的所有）文檔

問題描述

3 個解決方案

解決方案1 2 已采納 2017-12-20 10:44:22

解決方案2 1 2014-06-05 01:11:41

解決方案3 0 2014-06-05 21:26:38

解決方案1
2 已采納 2017-12-20 10:44:22

解決方案2
1 2014-06-05 01:11:41

解決方案3
0 2014-06-05 21:26:38