简体   繁体   中英

MongoDB query over indexed field very slow

I have a collection with large number of documents (32 809 900). All documents have a field called soft_deleted . I've also created soft_deleted: 1 field. Then I've tested a few different queries related to the field. Here are my results:

Query                                Number of results  Time in milliseconds
db.cgm_egvs
  .find().count()                    32809900           90
db.cgm_egvs
  .find({soft_deleted: true})        2820897            688
  .count()
db.cgm_egvs
  .find({soft_deleted: false})       29989003           3983
  .count()
db.cgm_egvs
  .find({soft_deleted: null})        0                  42
  .count()
db.cgm_egvs
  .find({soft_deleted: {$ne: true}}) 29989003           82397
  .count()

Why are query times so different between these queries? I'd expect finding documents where soft_deleted is true or false to take the same amount of time. More importantly, why is querying by != true so much slower than any other query?

The soft_deleted field has very low cardinality; it has only two distinct values true and false, so you will not have much benefit having an index on this field. Normally indexes perform better on fields with high cardinality.

In case of {soft_deleted: true} query, the number of rows with soft_deleted: true is very less compared to {soft_deleted: false}, and mongodb had to scan much lower number of index entries. So the {soft_deleted: true} query took less time.

Similarly the query {soft_deleted: null} took less time as the index has only 2 distinct values, and in this case much lower scanning is required.

Your final query is using $ne operator, and $ne operator is not selective (selectivity is the ability of a query to narrow results using the index). https://docs.mongodb.com/v3.0/faq/indexes/#using-ne-and-nin-in-a-query-is-slow-why . So it took much more time to execute.

I am not sure why other queries are slow (as I am awaiting explain dump), but in case $ne the point is that we have extra step added there, so that means this function is wrapped in first equal and then not equal -> see explain dump below in parsedQuery section and find not step

db.getCollection('a1').find({Level:{$ne:"Info"}}).explain()

"queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "logi.a1",
    "indexFilterSet" : false,
    "parsedQuery" : {
        "$not" : {
            "Level" : {
                "$eq" : "Info"
            }
        }
    },

db.getCollection('a1').find({Level:"Info"}).explain()

"queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "logi.a1",
    "indexFilterSet" : false,
    "parsedQuery" : {
        "Level" : {
            "$eq" : "Info"
        }
    },

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM