简体   繁体   中英

Count() query in MongoDB with regex filter : slow performance

With MongoDB 2.6.5

I have a collection of documents with this structure :

{
  "_id" : ObjectId("5485cd0c6b0f96004220e414"),
  "exampleList" : [{
      "Value" : "uri:obj:id:1258477.479.129403280"
    },{
      "Value" : "uri:obj:id:1258477.542.542541247"
    }, {
      "Value" : "uri:obj:id:1258477.365.455255425"
    }
    [...]
    {
      "Value" : "uri:obj:id:1258477.147.855556255"
    }]
}

I have set a multikey index on "exampleList.Value".

I want to request it with a regex of type "starts with" but it can be very slow according to the regex. Shorter is the fix part of the regex (more results), slower is the treatment.

Demo with a 100 millions documents collection "myCollection" :

Fastest execution (immediate):

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.129403280$/}})
156

Fast execution (some seconds):

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.129.*$/}})
502

Slower execution (some seconds) :

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479\.1.*$/}})
40947

Slow execution (~2 minutes)

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.479.*$/}})
342275

Very very slooooowww execution (not terminated, several minutes)

> db.myCollection.count({"exampleList.Value":{$regex:/^uri:obj:id:1258477\.47.*$/}})

I don't understand why the time of the treatment is not the same in all these queries.

The first regex completes quickly because it uses only explicit characters.

/^uri:obj:id:1258477\.479\.129403280$/

Compare to the other regexes which use greedy wildcards '.*'.

/^uri:obj:id:1258477\.47.*$/

This contains the shortest set of definite characters at the beginning of the string, over many millions of documents there may be many that match the first part.

Try replacing the '.*' with an absolute length or range ie '.{0,25}'. It may be quicker yet to replace with a string.beginsWith method if available.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM