如何对 mongodb 中的数百万条记录进行有效的分页过滤？

Question

I know there are a LOT of questions regarding this subject.我知道关于这个主题有很多问题。 And while most work, they are really poor in performance when there are millions of records.虽然大多数工作正常，但当有数百万条记录时，它们的性能确实很差。

I have a collection with 10,000,000 records.我有一个包含 10,000,000 条记录的集合。

At first I was using mongoose paginator v2 and it took around 8s to get each page, with no filtering and 25s when filtering.起初我使用的是 mongoose paginator v2，每页大约需要 8 秒，没有过滤，过滤时需要 25 秒。 Fairly decent compared to the other answers I found googling around.与我在谷歌搜索中发现的其他答案相比相当不错。 Then I read about aggregate (in some question about the same here) and it was a marvel, 7 ms to get each page without filtering, no matter what page it is:然后我阅读了有关聚合的信息（在这里有一些关于相同的问题），这是一个奇迹，7 毫秒无需过滤即可获取每个页面，无论它是什么页面：

  const pageSize = +req.query.pagesize;
  const currentPage = +req.query.currentpage;

  let recordCount;
  ServiceClass.find().count().then((count) =>{
    recordCount = count;
    ServiceClass.aggregate().skip(currentPage).limit(pageSize).exec().then((documents) => {
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents,
        count: recordCount,
      });
    })
    .catch((error) => {
      res.status(500).json({ message: msgGettingRecordsError });
    });
  }).catch((error) => {
    res.status(500).json({ message: "Error getting record count" });
  });

What I'm having issues with is when filtering.我遇到的问题是过滤时。 aggregate doesn't really work like find so my conditions are not working. aggregate 不像 find 那样工作，所以我的条件不起作用。 I read the docs about aggregate and tried with [ {$match: {description: {$regex: regex}}} ] inside aggregate as a start but it did not return anything.我阅读了有关聚合的文档并尝试使用[ {$match: {description: {$regex: regex}}} ] inside aggregate 作为开始，但它没有返回任何内容。 This is my current working function for filtering and pagination (which takes 25s):这是我目前用于过滤和分页的工作 function（需要 25 秒）：

  const pageSize = +req.query.pagesize;
  const currentPage = +req.query.currentpage;

  const filter = req.params.filter;
  const regex = new RegExp(filter, 'i');

  ServiceClass.paginate({
    $or:[
      {code: { $regex: regex }},
      {description: { $regex: regex }},
    ]
  },{limit: pageSize, page: currentPage}).then((documents)=>{
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents
      });
    }).catch((error) => {
    res.status(500).json({ message: "Error getting the records." });
  });

code and description are both indexes.代码和描述都是索引。 code is a unique index and description is just a normal index.代码是唯一索引，描述只是普通索引。 I need to search for documents which contains a string either in code or description field.我需要搜索在代码或描述字段中包含字符串的文档。

What is the most efficient way to filter and paginate when you have millions of records?当您有数百万条记录时，过滤和分页的最有效方法是什么？

Answer 1

The most efficient way to filter and paginate when you have millions of records is to use the MongoDB's built-in pagination and filtering features, such as the skip() , limit() , and $match operators in the aggregate() pipeline.当您有数百万条记录时，过滤和分页的最有效方法是使用 MongoDB 的内置分页和过滤功能，例如aggregate()管道中的skip() 、 limit()和$match运算符。

You can use the skip() operator to skip a certain number of documents, and the limit() operator to limit the number of documents returned.您可以使用 skip() 运算符跳过一定数量的文档，并使用limit()运算符限制返回的文档数。 You can also use the $match operator to filter the documents based on certain conditions.您还可以使用 $match 运算符根据特定条件过滤文档。

To filter your documents based on the code or description field, you can use the $match operator with the $or operator, like this:要根据代码或描述字段过滤文档，您可以使用$match运算符和 $or 运算符，如下所示：

ServiceClass.aggregate([
    { $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] } },
    { $skip: currentPage },
    { $limit: pageSize }
])

You can also use the $text operator instead of $regex which will perform more efficiently when you have text search queries.您还可以使用$text运算符而不是 $regex ，这在您进行文本搜索查询时执行效率更高。

It's also important to make sure that the relevant fields (code and description) have indexes, as that will greatly speed up the search process.确保相关字段（代码和描述）具有索引也很重要，因为这将大大加快搜索过程。

You might have to adjust the query according to your specific use case and data.您可能必须根据您的特定用例和数据调整查询。

Answer 2

Below code will get the paginated result from the database along with the count of total documents for that particular query simultaneously.下面的代码将同时从数据库中获取分页结果以及该特定查询的文档总数。

const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
const skip = currentPage * pageSize - pageSize;
const query = [
    {
      $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] },
    },
    {
      $facet: {
        result: [
          {
            $skip: skip,
          },
          {
            $limit: pageSize,
          },
          {
            $project: {
              createdAt: 0,
              updatedAt: 0,
              __v: 0,
            },
          },
        ],
        count: [
          {
            $count: "count",
          },
        ],
      },
    },
    {
      $project: {
        result: 1,
        count: {
          $arrayElemAt: ["$count", 0],
        },
      },
    },
  ];
const result = await ServiceClass.aggregate(query);
console.log(result)
// result is an object with result and count key.

Hope it helps.希望能帮助到你。

如何对 mongodb 中的数百万条记录进行有效的分页过滤？

问题描述

2 个解决方案

解决方案1
1 2023-01-19 19:26:28

解决方案2
1 已采纳 2023-01-20 13:34:10

如何对 mongodb 中的数百万条记录进行有效的分页过滤？

问题描述

2 个解决方案

解决方案1 1 2023-01-19 19:26:28

解决方案2 1 已采纳 2023-01-20 13:34:10

解决方案1
1 2023-01-19 19:26:28

解决方案2
1 已采纳 2023-01-20 13:34:10