I have a set of documents (messages) in MongoDB collection as below. I want to just preserve the latest 500 records for individual user pairs. Users are identified as sentBy
and sentTo
.
/* 1 */
{
"_id" : ObjectId("5f1c1b00c62e9b9aafbe1d6c"),
"sentAt" : ISODate("2020-07-25T11:44:00.004Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54d6732319f899c704b21ef7"),
"sentTo" : ObjectId("54d6732319f899c704b21ef5"),
}
/* 2 */
{
"_id" : ObjectId("5f1c1b3cc62e9b9aafbe1d6d"),
"sentAt" : ISODate("2020-07-25T11:45:00.003Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54d6732319f899c704b21ef9"),
"sentTo" : ObjectId("54d6732319f899c704b21ef8"),
}
/* 3 */
{
"_id" : ObjectId("5f1c1b78c62e9b9aafbe1d6e"),
"sentAt" : ISODate("2020-07-25T11:46:00.003Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54d6732319f899c704b21ef6"),
"sentTo" : ObjectId("54d6732319f899c704b21ef8"),
}
/* 4 */
{
"_id" : ObjectId("5f1c1c2e1449dd9bbef28575"),
"sentAt" : ISODate("2020-07-25T11:49:02.012Z"),
"readAt" : ISODate("1970-01-01T00:00:00.000Z"),
"msgBody" : "dummy text",
"msgType" : "text",
"sentBy" : ObjectId("54cfcf93e2b8994c25077924"),
"sentTo" : ObjectId("54d6732319f899c704b21ef5"),
}
/* and soon... assume it to be 10k+ */
Algo that came to my mind is -
_id
that should be preserved.deleteMany()
with $nin
conditionPlease help I struggled a lot on this, and have not got any success. Many Thanks:)
Depending on scale I would do one of the two following:
db.collection.aggregate([
{
$sort: {
sentAt: 1
}
},
{
$group: {
_id: {
$cond: [
{$gt: ["$sentBy", "$sentTo"]},
["$sendBy", "$sentTo"],
["$sentTo", "$sendBy"],
]
},
roots: {$push: "$$ROOT"}
}
},
{
$project: {
roots: {$slice: ["$roots", -500]}
}
},
{
$unwind: "$roots"
},
{
$replaceRoot: {
newRoot: "$roots"
}
},
{
$out: "this_collection"
}
])
The sort stage has to come first as you can't sort an inner array post group, the $cond
in the group stage simulates the $or
operator logic which can't be used there. finally instead of retrieving the result than using deleteMany
with $nin
you can just use $out to rewrite the current collection.
let userIds = await db.collection.distinct("sentBy");
let done = [1];
for (let i = 0; i < userIds.length; i++) {
let matches = await db.collection.aggregate([
{
$match: {
$and: [
{
$or: [
{
"sentTo": userIds[i]
},
{
"sendBy": userIds[i]
}
]
},
{ // this is not necessary it's just to avoid running on ZxY and YxZ
$or: [
{
sendTo: {$nin: done}
},
{
sendBy: {$nin: done}
}
]
}
]
}
},
{
$sort: {
sentAt: 1
}
},
{
$group: {
_id: {
$cond: [
{$eq: ["$sentBy", userIds[i]]},
"$sendTo",
"$sentBy"
]
},
roots: {$push: "$$ROOT"}
}
},
{
$project: {
roots: {$slice: ["$roots", -500]}
}
},
{
$unwind: "$roots"
},
{
$group: {
_id: null,
keepers: {$push: "$roots._id"}
}
}
]).toArray();
if (matches.length) {
await db.collection.deleteMany(
{
$and: [
{
$or: [
{
"sentTo": userIds[i]
},
{
"sendBy": userIds[i]
}
]
},
{ // this is only necessary if you used it above.
$or: [
{
sendTo: {$nin: done}
},
{
sendBy: {$nin: done}
}
]
},
{
_id: {$nin: matches[0].keepers}
}
]
}
)
}
done.push(userIds[i])
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.