简体   繁体   中英

How to Speed Up Count on MongoDB View

I have been troubleshooting why a MongoDB view I created is so slow. The view targets the transactions collection, and returns records that have an openBalance that is greater than 0 . I also run some additional aggregation stages to shape the data the way I want it.

In order to speed up the execution of the view it makes use of an index on the targeted collection by matching on the indexed field in stage one of the view's aggregation pipeline, like so:

// View Stage 1

{ "transactions.details.openBalance" : { "$exists" : true, "$gt" : 0.0 } }

After much investigation I have determined that the aggregation from the view returns data very quickly. What's slow is the count that's run as part of the endpoint:

let count = await db.collection('view_transactions_report').find().count();

So what I'm trying to figure out now is why the count is so much slower on the view than on the underlying collection, and what I can do to speed it up. Or, perhaps there's an alternative way to generate the count?

The underlying collection has something like 800,000 records, but the count returns quickly. But the count on the view, which only returns a filtered set of 10,000 of those initial 800,000 records, returns much more slowly. In terms of specifics, I'm talking about 3/4 of a second for the count on the collection to return, verses six seconds for the count on the mongo view to return.

So, first off, why is the count so much slower on the view (with it's much smaller data set) than on the underlying collection, and secondly, what can I do to address the speed of the count for the view?

I have a couple other aggregation queries I'm running, to determine totalCustomers and totalOpenBalance , that also seem to run slow (see code below).

The relevant part of my endpoint function code looks like this:

// previous code

  let count = await db.collection('view_transaction_report').find(search).count();

  let totalCustomers = await db.collection('view_transaction_report').find(search).count({
     $sum: "customer._id"
   });

  let result = {};

  if (totalCustomers > 0) {
    result = await db.collection('view_transaction_report').aggregate([{
        $match: search,
      },
      {
        $group: {
          _id: null,
          totalOpenBalance: {
            $sum: '$lastTransaction.details.openBalance'
          }
        }
      }
    ]).next();
  }

  db.collection('view_transaction_report').find(search).skip(skip).limit(pagesize).forEach(function (doc) {
    docs.push(doc);
  }, function (err) {
    if (err) {
      if (!ioOnly) {
        return next(err);
      } else {
        return res(err);
      }
    }
    if (ioOnly) {
      res({
        sessionId: sessID,
        count: count,
        data: docs,
        totalCustomers: totalCustomers,
        totalOpenBalance: result.totalOpenBalance
      });
    } else {
      res.send({
        count: count,
        data: docs,
        totalCustomers: totalCustomers,
        totalOpenBalance: result.totalOpenBalance
      });
    }
  });

In terms of executionStats , this is what shows for the queryPlanner section of the generated view:

            "queryPlanner" : {
                "plannerVersion" : 1.0, 
                "namespace" : "vio.transactions", 
                "indexFilterSet" : false, 
                "parsedQuery" : {
                    "$and" : [
                        {
                            "transactions.details.openBalance" : {
                                "$gt" : 0.0
                            }
                        }, 
                        {
                            "transactions.destails.openBalance" : {
                                "$exists" : true
                            }
                        }
                    ]
                }, 
                "winningPlan" : {
                    "stage" : "CACHED_PLAN", 
                    "inputStage" : {
                        "stage" : "FETCH", 
                        "filter" : {
                            "transactions.details.openBalance" : {
                                "$exists" : true
                            }
                        }, 
                        "inputStage" : {
                            "stage" : "IXSCAN", 
                            "keyPattern" : {
                                "transactions.details.openBalance" : 1.0
                            }, 
                            "indexName" : "openBalance", 
                            "isMultiKey" : true, 
                            "multiKeyPaths" : {
                                "transactions.details.openBalance" : [
                                    "transactions", 
                                    "transactions.details"
                                ]
                            }, 
                            "isUnique" : false, 
                            "isSparse" : true, 
                            "isPartial" : false, 
                            "indexVersion" : 2.0, 
                            "direction" : "forward", 
                            "indexBounds" : {
                                "transactions.details.openBalance" : [
                                    "(0.0, inf.0]"
                                ]
                            }
                        }
                    }
                }, 
                "rejectedPlans" : [

                ]
            }

In the comments, @Wan Bachtiar mentioned that "openBalance" looks to be a multikey index . To clarify, yes, in the targeted collection , the "openBalance" field is an embedded field within an array. This is the case even though, in the view, the data is shaped in such a way that "openBalance" is an embedded field that is not within an array.

The multikey index on the targeted collection is where the issue lies, because instead of a 1 for 1 document situation, Mongo needs to look through every array element pertaining to this "openBalance" field, which, logically, dramatically increases the scan time - because sometimes there are many, many array elements pertaining to this particular field.

After some further checking, I realized I can address this issue by changing how I populate "openBalance" to our mongo collection via the ETL. By making this change I'll be able to make "openBalance" a standard index, rather than a multikey index, which in turn will allow mongo to search a much smaller data set in order to return my counts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM