我如何优化这个 Mongo 聚合管道

Question

I am working on an application that shows a table of aggregate data I sum, part of the dashboard I want to show a sample of the last few events that came into the collection:我正在开发一个应用程序，该应用程序显示我汇总的汇总数据表，这是仪表板的一部分，我想显示进入集合的最后几个事件的示例：

I am executing this pipeline but it crashes OOM because I have millions of documents in the collection, is there anything I can do?我正在执行此管道，但它导致 OOM 崩溃，因为我在集合中有数百万个文档，我能做些什么吗？ I want to get the last 5 texts我想得到最后 5 个文本

Events.aggregate([
    {
        $match: {event_type: {$in: [1,5,10,12]}}
    },
    {
        $group: {
            _id: "$event_type", avgTime: {$avg: "$time"}, avgCost: {$avg: "$cost"}, maxCost: {$max: "$cost"}, maxTime: {$max: "$time"}, texts: {$push: "$text"}
        }
    },
    {
        $addFields: {
            texts: { $slice: ["$text", {$subtract: [{$size: "$text"}, 5]}, {$size: "$text"}]}
        }
    }
])

Answer 1

If you're using Mongo version 5.2+ then you can use the new $lastN operator, like so:如果您使用的是 Mongo 5.2+ 版，那么您可以使用新的$lastN运算符，如下所示：

db.collection.aggregate([
  {
    $match: {
      event_type: {
        $in: [
          1,
          5,
          10,
          12
        ]
      }
    }
  },
  {
    $group: {
      _id: "$event_type",
      avgTime: {
        $avg: "$time"
      },
      avgCost: {
        $avg: "$cost"
      },
      maxCost: {
        $max: "$cost"
      },
      maxTime: {
        $max: "$time"
      },
      texts: {
        $lastN: {
          n: 5,
          input: "$text"
        }
      }
    }
  }
])

Mongo Playground蒙戈游乐场

If you're on a lesser Mongo version, I recommend you just split this into 2 calls, and use a find instead that can utilize indexes:如果您使用的是较小的 Mongo 版本，我建议您将其拆分为 2 个调用，并使用可以利用索引的find代替：

const results = await db.collection.aggregate([
  {
    $match: {
      event_type: {
        $in: [
          1,
          5,
          10,
          12
        ]
      }
    }
  },
  {
    $group: {
      _id: "$event_type",
      avgTime: {
        $avg: "$time"
      },
      avgCost: {
        $avg: "$cost"
      },
      maxCost: {
        $max: "$cost"
      },
      maxTime: {
        $max: "$time"
      }
    }
  }
]);

// should use promise all or bluebird here instead, kept it a for loop for readability.
for (let i = 0; i < results.length; i++) {
    const lastTexts = await db.collection.find({event_type: results[i]._id}).sort({_id: -1}).limit(5);
    results[i].texts = lastTexts.map(v => v.text)
}

我如何优化这个 Mongo 聚合管道

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-24 15:56:30

我如何优化这个 Mongo 聚合管道

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-24 15:56:30

解决方案1
1 已采纳 2022-08-24 15:56:30