MongoDB $group + $sum aggregation is very slow

Question

I have an aggregation query in MongoDB:

[{
    $group: {
        _id: '$status',
        status: {
            $sum: 1
        }
    }
}]

It is running on a collection that has ~80 million documents. The status field is indexed, yet the query is very slow and runs for around 60 seconds or more.

I did an explain() on the query, but still got almost nowhere:

{
        "explainVersion" : "1",
        "stages" : [
                {
                        "$cursor" : {
                                "queryPlanner" : {
                                        "namespace" : "loa.document",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {

                                        },
                                        "queryHash" : "B9878693",
                                        "planCacheKey" : "8EAA28C6",
                                        "maxIndexedOrSolutionsReached" : false,
                                        "maxIndexedAndSolutionsReached" : false,
                                        "maxScansToExplodeReached" : false,
                                        "winningPlan" : {
                                                "stage" : "PROJECTION_SIMPLE",
                                                "transformBy" : {
                                                        "status" : 1,
                                                        "_id" : 0
                                                },
                                                "inputStage" : {
                                                        "stage" : "COLLSCAN",
                                                        "direction" : "forward"
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }
                        }
                },
                {
                        "$group" : {
                                "_id" : "$status",
                                "status" : {
                                        "$sum" : {
                                                "$const" : 1
                                        }
                                }
                        }
                }
        ],
        "serverInfo" : {
                "host" : "rack-compute-2",
                "port" : 27017,
                "version" : "5.0.6",
                "gitVersion" : "212a8dbb47f07427dae194a9c75baec1d81d9259"
        },
        "serverParameters" : {
                "internalQueryFacetBufferSizeBytes" : 104857600,
                "internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
                "internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
                "internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
                "internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
                "internalQueryProhibitBlockingMergeOnMongoS" : 0,
                "internalQueryMaxAddToSetBytes" : 104857600,
                "internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
        },
        "command" : {
                "aggregate" : "document",
                "pipeline" : [
                        {
                                "$group" : {
                                        "_id" : "$status",
                                        "status" : {
                                                "$sum" : 1
                                        }
                                }
                        }
                ],
                "explain" : true,
                "cursor" : {

                },
                "lsid" : {
                        "id" : UUID("a07e17fe-65ff-4d38-966f-7517b7a5d3f2")
                },
                "$db" : "loa"
        },
        "ok" : 1
}

I see that it does a full COLLSCAN , I just can't understand why.

I plan on supporting a couple hundred million (or even a billion) documents in that collection, but this problem hijacks my plans for seemingly no reason.

Answer 1

You can advice the query planner to use the index as follow:

db.test.explain("executionStats").aggregate(
   [
     {$group:{ _id:"$status" ,status:{$sum:1} }}
   ],
     {hint:"status_1"}
   )

Make sure the index name in the hint is same as created ... (db.test.getIndexes() will show you the exact index name )

MongoDB $group + $sum aggregation is very slow

Question

1 answers

solution1
1 2022-06-07 20:03:13

MongoDB $group + $sum aggregation is very slow

Question

1 answers

solution1 1 2022-06-07 20:03:13

solution1
1 2022-06-07 20:03:13