如何对 mongodb 中的数百万条记录进行有效的分页过滤？

Question

我知道关于这个主题有很多问题。 虽然大多数工作正常，但当有数百万条记录时，它们的性能确实很差。

我有一个包含 10,000,000 条记录的集合。

起初我使用的是 mongoose paginator v2，每页大约需要 8 秒，没有过滤，过滤时需要 25 秒。 与我在谷歌搜索中发现的其他答案相比相当不错。 然后我阅读了有关聚合的信息（在这里有一些关于相同的问题），这是一个奇迹，7 毫秒无需过滤即可获取每个页面，无论它是什么页面：

  const pageSize = +req.query.pagesize;
  const currentPage = +req.query.currentpage;

  let recordCount;
  ServiceClass.find().count().then((count) =>{
    recordCount = count;
    ServiceClass.aggregate().skip(currentPage).limit(pageSize).exec().then((documents) => {
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents,
        count: recordCount,
      });
    })
    .catch((error) => {
      res.status(500).json({ message: msgGettingRecordsError });
    });
  }).catch((error) => {
    res.status(500).json({ message: "Error getting record count" });
  });

我遇到的问题是过滤时。 aggregate 不像 find 那样工作，所以我的条件不起作用。 我阅读了有关聚合的文档并尝试使用[ {$match: {description: {$regex: regex}}} ] inside aggregate 作为开始，但它没有返回任何内容。 这是我目前用于过滤和分页的工作 function（需要 25 秒）：

  const pageSize = +req.query.pagesize;
  const currentPage = +req.query.currentpage;

  const filter = req.params.filter;
  const regex = new RegExp(filter, 'i');

  ServiceClass.paginate({
    $or:[
      {code: { $regex: regex }},
      {description: { $regex: regex }},
    ]
  },{limit: pageSize, page: currentPage}).then((documents)=>{
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents
      });
    }).catch((error) => {
    res.status(500).json({ message: "Error getting the records." });
  });

代码和描述都是索引。 代码是唯一索引，描述只是普通索引。 我需要搜索在代码或描述字段中包含字符串的文档。

当您有数百万条记录时，过滤和分页的最有效方法是什么？

Answer 1

当您有数百万条记录时，过滤和分页的最有效方法是使用 MongoDB 的内置分页和过滤功能，例如aggregate()管道中的skip() 、 limit()和$match运算符。

您可以使用 skip() 运算符跳过一定数量的文档，并使用limit()运算符限制返回的文档数。 您还可以使用 $match 运算符根据特定条件过滤文档。

要根据代码或描述字段过滤文档，您可以使用$match运算符和 $or 运算符，如下所示：

ServiceClass.aggregate([
    { $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] } },
    { $skip: currentPage },
    { $limit: pageSize }
])

您还可以使用$text运算符而不是 $regex ，这在您进行文本搜索查询时执行效率更高。

确保相关字段（代码和描述）具有索引也很重要，因为这将大大加快搜索过程。

您可能必须根据您的特定用例和数据调整查询。

Answer 2

下面的代码将同时从数据库中获取分页结果以及该特定查询的文档总数。

const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
const skip = currentPage * pageSize - pageSize;
const query = [
    {
      $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] },
    },
    {
      $facet: {
        result: [
          {
            $skip: skip,
          },
          {
            $limit: pageSize,
          },
          {
            $project: {
              createdAt: 0,
              updatedAt: 0,
              __v: 0,
            },
          },
        ],
        count: [
          {
            $count: "count",
          },
        ],
      },
    },
    {
      $project: {
        result: 1,
        count: {
          $arrayElemAt: ["$count", 0],
        },
      },
    },
  ];
const result = await ServiceClass.aggregate(query);
console.log(result)
// result is an object with result and count key.

希望能帮助到你。

如何对 mongodb 中的数百万条记录进行有效的分页过滤？

问题描述

2 个解决方案

解决方案1
1 2023-01-19 19:26:28

解决方案2
1 已采纳 2023-01-20 13:34:10

如何对 mongodb 中的数百万条记录进行有效的分页过滤？

问题描述

2 个解决方案

解决方案1 1 2023-01-19 19:26:28

解决方案2 1 已采纳 2023-01-20 13:34:10

解决方案1
1 2023-01-19 19:26:28

解决方案2
1 已采纳 2023-01-20 13:34:10