简体   繁体   中英

Remove oldest N elements from document array

I have a document in my mongodb that contains a very large array (about 10k items). I'm trying to only keep the latest 1k in the array (and so remove the first 9k elements). The document looks something like this:

    {
        "_id" : 'fakeid64',
        "Dropper" : [
            {
                "md5" : "fakemd5-1"
            },
            {
                "md5" : "fakemd5-2"
            },
            ...,
            {
                "md5": "fakemd5-10000"
            }
        ]
    }

How do I accomplish that?

The correct operation to do here actually involves the$push operator using the$each and $slice modifiers. The usage may initially appear counter-intuitive that you would use$push to "remove" items from an array, but the actual use case is clear when you see the intended operation.

db.collection.update(
  { "_id": "fakeid64" },
  { "$push": { "Dropper": { "$each": [], "$slice": -1000 } }
)

You can in fact just run for your whole collection as:

db.collection.update(
  { },
  { "$push": { "Dropper": { "$each": [], "$slice": -1000 } },
  { "multi": true }
)

What happens here is that the modifier for $each takes an array of items to "add" in the$push operation, which in this case we leave empty since we do not actually want to add anything. The $slice modifier given a "negative" value is actually saying to keep the "last n" elements present in the array as the update is performed, which is exactly what you are asking.

The general "intended" case is to use $slice when adding new elements to "maintain" the array at a "maximum" given length, which in this case would be 1000. So you would generally use in tandem with actually "adding" new items like this:

db.collection.update(
  { "_id": "fakeid64" },
  { "$push": { "Dropper": { "$each": [{ "md5": "fakemd5-newEntry"}], "$slice": -1000 } }
)

This would append the new item(s) provided in $each whilst also removing any items from the "start" of the array where the total length given the addition was greater than 1000.


It is stated incorrectly elsewhere that you would use $pullAll with a supplied list of the array content already existing in the document, but the operation is actually two requests to the database.

The misconception being that the request is sent as "one", but it actually is not and is basically interpreted as the longer form ( with correct usage of .slice() ):

var md5s = db.collection.findOne({ "_id": "fakeid64" }).Dropper.slice(-1000);

db.collection.update(
  { "_id": "fakeid64" },
  { "$pullAll": { "Dropper": md5s } }
)

So you can see that this is not very efficient and is in fact quite dangerous when you consider that the state of the array within the document "could" possibly change in between the "read" of the array content and the actual "write" operation on update since they occur separately.

This is why MongoDB has atomic operators for$push with $slice as is demonstrated. Since it is not only more efficient, but also takes into consideration the actual "state" of the document being modified at the time the actual modification occurs.

you can use $pullAll operator suppose you use python/pymongo driver:

yourcollection.update_one(
 {'_id': fakeid64}, 
 {'$pullAll': {'Dropper': yourcollection.find_one({'_id': 'fakeid64'})['Dropper'][:9000]}}
)

or in mongo shell:

db.yourcollection.update(
  { _id: 'fakeid64'}, 
  {$pullAll: {'Dropper': db.yourcollection.findOne({'_id' : 'fakeid64'})['Dropper'].slice(0,9000)}}
)

(*) having saying that it would be much better if you didn't allow your document(s) to grow this much in first place

This is just a representation of query. Basically you can unwind with limit and skip, then use cursor foreach to remove the items like below :

db.your_collection.aggregate([
    { $match : { _id : 'fakeid64' } },
    { $unwind : "$Dropper"},
    { $skip : 1000},
    { $limit : 9000}
]).forEach(function(doc){
    db.your_collection.update({ _id : doc._id}, { $pull : { Dropper : doc.Dropper} }); 
});

from mongo docs

 db.students.update( { _id: 1 }, { $push: { scores: { $each: [ { attempt: 3, score: 7 }, { attempt: 4, score: 4 } ], $sort: { score: 1 }, $slice: -3 } } } )

The following update uses the $push operator with:

the $each modifier to append to the array 2 new elements, the $sort modifier to order the elements by ascending (1) score, and the $slice modifier to keep the last 3 elements of the ordered array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM