简体   繁体   中英

MongoDB - Querying performance for over 10 million records

First of all: I already read a lot of post according to MongoDB query performance, but I didn't find any good solution.

Inside the collection, the document structure looks like:

{
    "_id" : ObjectId("535c4f1984af556ae798d629"),
    "point" : [
        -4.372925494081455,
        41.367710205649544
    ],
    "location" : [
        {
            "x" : -7.87297955453618,
            "y" : 73.3680160842939
        },
        {
            "x" : -5.87287143362673,
            "y" : 73.3674043270052
        }
    ],
    "timestamp" : NumberLong("1781389600000")
}

My collection already has an index:

db.collection.ensureIndex({timestamp:-1})

Query looks like:

db.collection.find({ "timestamp" : { "$gte" : 1380520800000 , "$lte" : 1380546000000}})

Despite of this, the response time is too high, about 20 - 30 seconds (this time depends on the specified query params)

Any help is useful!

Thanks in advance.

EDIT: I changed the find params, replacing these by real data.

The above query takes 46 seconds, and this is the information given by explain() function:

{
    "cursor" : "BtreeCursor timestamp_1",
    "isMultiKey" : false,
    "n" : 124494,
    "nscannedObjects" : 124494,
    "nscanned" : 124494,
    "nscannedObjectsAllPlans" : 124494,
    "nscannedAllPlans" : 124494,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 45,
    "nChunkSkips" : 0,
    "millis" : 46338,
    "indexBounds" : {
        "timestamp" : [
            [
                1380520800000,
                1380558200000
            ]
        ]
    },
    "server" : "ip-XXXXXXXX:27017"
}

The explain-output couldn't be more ideal. You found 124,494 documents via index ( nscanned ) and they all were valid results, so they all were returned ( n ). It still wasn't an index-only query, because the bounds weren't exact values found in specific documents.

The reason why this query is a bit slow could be the huge amount of data it returned. All the documents you found must be read from hard-drive (when the collection is cold), scanned, serialized, sent to the client via network and deserialized by the client.

Do you really need that much data for your use-case? When the answer is yes, does responsiveness really matter? I do not know what kind of application you actually want to create, but I am wildly guessing that yours is one of three use-cases:

  1. You want to show all that data in form of some kind of report. That would mean the output would be a huge list the user has to scroll through. In that case I would recommend to use pagination. Only load as much data as fits on one screen and provide next and previous buttons. MongoDB pagination can be done with the cursor methods .limit(n) and .skip(n) .
  2. The above, but it is some kind of offline-report the user can download and then examine with all kinds of data-mining tools. In that case the initial load-time would be acceptable, because the user will spend some time with the data they received.
  3. You don't want to show all of that raw-data to the user but process it and present it in some kind of aggregated way, like a statistic or a diagram. In that case you could likely do all that work already on the database with the aggregation framework.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM