MongoDB OR condition indexing

Question

I have an OR query which I'm currently using for a semi large update. Essentially my collection is split into two data sets;

1 main repository and 1 subset of the main repository. This is just to allow quicker searching on a small subset of data.

I'm finding however my query which I create to pull things into the subset is timing out.. and when looking at the explain it looks like two queries are actually happening.

PRIMARY> var date = new Date(2012,05,01);
PRIMARY> db.col.find(
  {"$or":[
      {"date":{"$gt":date}},
      {"keywords":{"$in":["Help","Support"]}}
   ]}).explain();

This produces:

{
"clauses" : [
    {
        "cursor" : "BtreeCursor ldate_-1",
        "nscanned" : 1493872,
        "nscannedObjects" : 1493872,
        "n" : 1493872,
        "millis" : 1035194,
        "nYields" : 3396,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {
            "ldate" : [
                [
                    ISODate("292278995-01--2147483647T07:12:56.808Z"),
                    ISODate("2012-06-01T07:00:00Z")
                ]
            ]
        }
    },
    {
        "cursor" : "BtreeCursor keywords_1 multi",
        "nscanned" : 88526,
        "nscannedObjects" : 88526,
        "n" : 2515,
        "millis" : 1071902,
        "nYields" : 56,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {
            "keywords" : [
                [
                    "Help",
                    "Help"
                ],
                [
                    "Support",
                    "Support"
                ]
            ]
        }
    }
],
 "nscanned" : 1582398,
 "nscannedObjects" : 1582398,
 "n" : 1496387,
 "millis" : 1071902
}

Is there something I can be indexing better to make this faster? Seems just way to slow...

Thanks ahead of time!

Answer 1

An $or query will evaluate each clause separately and combine the results to remove duplicates .. so if you want to optimize the queries you should first try to explain() each clause individually.

It looks like part of the problem is that you are retrieving a large number of documents while actively writing to that collection, as evidenced by the high nYields (3396). It would be worth reviewing mongostat output while the query is running to consider other factors such as page faulting, lock %, and read/write queues.

If you want to make this query faster for a large number of documents and very active collection updates, two best practice approaches to consider are:

1) Pre-aggregation

Essentially this is updating aggregate stats as documents are inserted/updated so you can make fast real-time queries. The MongoDB manual describes this use case in more detail: Pre-Aggregated Reports .

2) Incremental Map/Reduce

An incremental Map/Reduce approach can be used to calculate aggregate stats in successive batches (for example, from an hourly or daily cron job). With this approach you perform a Map/Reduce using the reduce output option to save results to a new collection, and include a query filter that only selects documents that have been created/updated since the last time this Map/Reduce job was run.

Answer 2

I think you should create a compound index on both date and keywords. Refer to the below post for more specifics based on your use-case

how to structure a compound index in mongodb

MongoDB OR condition indexing

Question

2 answers

solution1
1 ACCPTED 2012-11-03 13:40:20

solution2
0 2012-11-02 13:18:31

MongoDB OR condition indexing

Question

2 answers

solution1 1 ACCPTED 2012-11-03 13:40:20

solution2 0 2012-11-02 13:18:31

solution1
1 ACCPTED 2012-11-03 13:40:20

solution2
0 2012-11-02 13:18:31