简体   繁体   中英

mongodb sort and regex query in efficient way

    db.location.find(
     { "$or": [ 
         { "country_lc": /^unit/, "docType": "country" }, 
         { "region_lc": /^unit/, "docType": "region" }, 
         { "city_lc": /^unit/, "docType": "city" } 
    ]}, 
    { "country": 1, "region": 1, "city": 1, "docType" :1 }
   ).sort({ "country_lc" :1, "region_lc": 1, "city_lc":1 })

this is query in monodb is taking very much time. How to query this efficiently ? Below is the explain() output of the above query. I have total 442161 documents in the collection location.I have to do some prefix searching.I have done indexing in (country_lc,docType) ,(region_lc,docType),(city_lc,docType) and (country_lc,region_lc,city_lc). My mongo version is 2.4.9.

{
"cursor" : "BtreeCursor country_lc_1_region_lc_1_city_lc_1",
"isMultiKey" : false,
"n" : 29,
"nscannedObjects" : 76935,
"nscanned" : 442161,
"nscannedObjectsAllPlans" : 76935,
"nscannedAllPlans" : 442161,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 79,
"nChunkSkips" : 0,
"millis" : 81531,
"indexBounds" : {
    "country_lc" : [
        [
            {
                "$minElement" : 1
            },
            {
                "$maxElement" : 1
            }
        ]
    ],
    "region_lc" : [
        [
            {
                "$minElement" : 1
            },
            {
                "$maxElement" : 1
            }
        ]
    ],
    "city_lc" : [
        [
            {
                "$minElement" : 1
            },
            {
                "$maxElement" : 1
            }
        ]
    ]
},
"server" : "prashanta:27017"

}

You could try creating a text index on the country_lc , region_lc and city_lc fields:

db.reviews.ensureIndex( { "country_lc": "text" } )
db.reviews.ensureIndex( { "region_lc": "text" } )
db.reviews.ensureIndex( { "city_lc": "text" } )

Text indices are a new feature in MongoDB 2.4. They were added to support text search of string content in documents of a collection. Please take a look at the official documentation for performance hints.

Moreover, you can give a try at rewriting the query as

db.location.find(
     { "docType": {"$in": [ "country", "region", "city" ]},
       "$or": [
         { "country_lc": /^unit/ },
         { "region_lc": /^unit/ },
         { "city_lc": /^unit/ },
       ]
    }, 
    { "country": 1, "region": 1, "city": 1, "docType" :1 }
   ).sort({ "country_lc" :1, "region_lc": 1, "city_lc":1 })

( Caution : This is, or isn't, equivalent to your query, depending on the structure of the documents.)

Now I happen to know you are running 2.4.9 which means you do not have index inter-sectioning and $or s cannot use sorted index. This answer might be different with that, which is in 2.6.

There are multiple problems with your query and it is considered a "bad" query in MongoDB, aside from the regex.

Ok, let's take the sort, in 2.4.9 sorts on $or will not use an index correctly ( https://jira.mongodb.org/browse/SERVER-1205 ) which means that you do not have scanAndOrder but you do have a nscanned count going on multiple times of your collection size.

The nscanned is 442,161 to be precise, since an $or is in fact many queries run ( http://docs.mongodb.org/manual/reference/operator/query/or/#or-clauses-and-indexes ) at the same time whose results are merged and then returned, you can see this proof in using more than one index on a $or even in 2.4.9.

I cannot see what indexes your clauses are using but I will assume those might not be fitting into an index either.

The problem is that 2.4.9 simply cannot do $or and sort with proper indexes. You must choose between either indexing the $or or the sort and even then only partially covering the query.

You have a couple of things you can do to fix this:

  • Upgrade to 2.6 where $or and sort can use an index
  • Even in 2.6 you might have problems due to the added docType field. You could try adding it to your index right after country_lc , however you might also be able to add it to the end of the index and it will work OK, but bare in mind that it will scan all entries below your matches in country_lc .
  • You might be able to take advantage of index inter-sectioning in 2.6 to get around this problem with each or clause but as the documentation states ( http://docs.mongodb.org/manual/reference/operator/query/or/#or-and-sort-operations ) $or specific indexes will be dropped so I don't think this will work.

It doesn't matter which way you shake it, this is a horrible query that will always result in a full collection scan, or at least a full index scan.

Simply on this one document:

{
    "country_lc" : "unitize",
    "region_lc" : "unitmost",
    "city_lc" : "unitleast"
}

The query cannot possibly anchor on any position in the index as no matter how you organize the order of the fields as indexed none of them will ever match due to the "exclusive" ( as in excluding everything ) nature of the $or operator.

So none of these approaches or other combinations will actually include an index:

db.location.ensureIndex({
    "country_lc" : 1,
    "region_lc" : 1,
    "city_lc" : 1
})

db.location.ensureIndex({
    "region_lc" : 1,
    "city_lc" : 1,
    "country_lc" : 1
})

db.location.ensureIndex({
    "region_lc" : 1,
    "country_lc" : 1,
    "city_lc" : 1
})

Even if you .hint() the query it cannot possibly find a range,and this is again due to the "exclusive" nature:

db.location.find(
    { "$or": [
        { "country_lc": /^unit/ },
        { "region_lc": /^unit/ },
        { "city_lc": /^unit/ }
    ]}
).hint(
    { country_lc: 1, region_lc: 1, city_lc: 1 }
).explain()

All I can think is that you really do not actually mean "words that begin as 'unit'" and you actually mean something else.

This is not just a MongoDB thing, this is a horrible thing to ask of any database engine.

You probably really want a "text search" engine that is specialized instead.

EDIT

Some people have posted uninformed responses so I think I will actualy post the explain output from the queries that are suggested:

{
    "cursor" : "BtreeCursor country_lc_1_region_lc_1_city_lc_1",
    "isMultiKey" : false,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1,
    "nscannedObjectsAllPlans" : 1,
    "nscannedAllPlans" : 1,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
            "country_lc" : [
                    [
                            {
                                    "$minElement" : 1
                            },
                            {
                                    "$maxElement" : 1
                            }
                    ]
            ],
            "region_lc" : [
                    [
                            {
                                    "$minElement" : 1
                            },
                            {
                                    "$maxElement" : 1
                            }
                    ]
            ],
            "city_lc" : [
                    [
                            {
                                    "$minElement" : 1
                            },
                            {
                                    "$maxElement" : 1
                            }
                    ]
            ]
    },
    "server" : "ubuntu:27017",
    "filterSet" : false
}

This clearly shows that even with an index selected you cannot possibly match anything within the index bounds.

And in further regard to the false comments that have been made, this query explain response comes from the 2.6 release of MongoDB. And is also repllicated in the current nightly builds.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM