简体   繁体   中英

Remove Records from MongoDB Collection based on the Individual User Pairs

I have a set of documents (messages) in MongoDB collection as below. I want to just preserve the latest 500 records for individual user pairs. Users are identified as sentBy and sentTo .

/* 1 */
{
    "_id" : ObjectId("5f1c1b00c62e9b9aafbe1d6c"),
    "sentAt" : ISODate("2020-07-25T11:44:00.004Z"),
    "readAt" : ISODate("1970-01-01T00:00:00.000Z"),
    "msgBody" : "dummy text",
    "msgType" : "text",
    "sentBy" : ObjectId("54d6732319f899c704b21ef7"),
    "sentTo" : ObjectId("54d6732319f899c704b21ef5"),
}

/* 2 */
{
    "_id" : ObjectId("5f1c1b3cc62e9b9aafbe1d6d"),
    "sentAt" : ISODate("2020-07-25T11:45:00.003Z"),
    "readAt" : ISODate("1970-01-01T00:00:00.000Z"),
    "msgBody" : "dummy text",
    "msgType" : "text",
    "sentBy" : ObjectId("54d6732319f899c704b21ef9"),
    "sentTo" : ObjectId("54d6732319f899c704b21ef8"),
}

/* 3 */
{
    "_id" : ObjectId("5f1c1b78c62e9b9aafbe1d6e"),
    "sentAt" : ISODate("2020-07-25T11:46:00.003Z"),
    "readAt" : ISODate("1970-01-01T00:00:00.000Z"),
    "msgBody" : "dummy text",
    "msgType" : "text",
    "sentBy" : ObjectId("54d6732319f899c704b21ef6"),
    "sentTo" : ObjectId("54d6732319f899c704b21ef8"),
}

/* 4 */
{
    "_id" : ObjectId("5f1c1c2e1449dd9bbef28575"),
    "sentAt" : ISODate("2020-07-25T11:49:02.012Z"),
    "readAt" : ISODate("1970-01-01T00:00:00.000Z"),
    "msgBody" : "dummy text",
    "msgType" : "text",
    "sentBy" : ObjectId("54cfcf93e2b8994c25077924"),
    "sentTo" : ObjectId("54d6732319f899c704b21ef5"),
}

/* and soon... assume it to be 10k+ */

Algo that came to my mind is -

  • Grouping first based on the OR operator
  • Sorting the records in descending order on a timely basis
  • Limit it to 500
  • Get the array of _id that should be preserved
  • Pass the ID(s) to new mongo query .deleteMany() with $nin condition

Please help I struggled a lot on this, and have not got any success. Many Thanks:)

Depending on scale I would do one of the two following:

  1. Assuming scale is somewhat low and you can actually group the entire collection in a reasonable time I would do something similar to what you suggjested:
db.collection.aggregate([
    {
        $sort: {
            sentAt: 1
        }
    },
    {
        $group: {
            _id: {
                $cond: [
                    {$gt: ["$sentBy", "$sentTo"]},
                    ["$sendBy", "$sentTo"],
                    ["$sentTo", "$sendBy"],
                ]
            },
            roots: {$push: "$$ROOT"}
        }
    },
    {
        $project: {
            roots: {$slice: ["$roots", -500]}
        }
    },
    {
        $unwind: "$roots"
    },
    {
        $replaceRoot: {
            newRoot: "$roots"
        }
    },
    {
        $out: "this_collection"
    }
])

The sort stage has to come first as you can't sort an inner array post group, the $cond in the group stage simulates the $or operator logic which can't be used there. finally instead of retrieving the result than using deleteMany with $nin you can just use $out to rewrite the current collection.

  1. If scale is way too big to support this then you should just iterate user by user and do what you suggested at first, here is a quick example:

let userIds = await db.collection.distinct("sentBy");

let done = [1];
for (let i = 0; i < userIds.length; i++) {
    
    let matches = await db.collection.aggregate([
        {
            $match: {
                $and: [
                    {
                        $or: [
                            {
                                "sentTo": userIds[i]
                            },
                            {
                                "sendBy": userIds[i]
                            }
                        ]
                    },
                    {  // this is not necessary it's just to avoid running on ZxY and YxZ 
                        $or: [
                            {
                                sendTo: {$nin: done}
                            },
                            {
                                sendBy: {$nin: done}
                            }
                        ]   
                    }
                ]
            }
        },
        {
            $sort: {
                sentAt: 1
            }
        },
        {
            $group: {
                _id: {
                    $cond: [
                        {$eq: ["$sentBy", userIds[i]]},
                        "$sendTo",
                        "$sentBy"
                    ]
                },
                roots: {$push: "$$ROOT"}
            }
        },
        {
            $project: {
                roots: {$slice: ["$roots", -500]}
            }
        },
        {
            $unwind: "$roots"
        },
        {
            $group: {
                _id: null,
                keepers: {$push: "$roots._id"}
            }
        }
    ]).toArray();
    
    if (matches.length) {
        await db.collection.deleteMany(
            {
                $and: [
                    {
                        $or: [
                            {
                                "sentTo": userIds[i]
                            },
                            {
                                "sendBy": userIds[i]
                            }
                        ]
                    },
                    {  // this is only necessary if you used it above.
                        $or: [
                            {
                                sendTo: {$nin: done}
                            },
                            {
                                sendBy: {$nin: done}
                            }
                        ]
                    },
                    {
                        _id: {$nin: matches[0].keepers}
                    }
                ]
            }
        )
    }
    
    done.push(userIds[i])
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM