简体   繁体   中英

Very slow MongoDB count in big database

I have a database with a collection in it with a large amount of documents (several millions). In this database I have (amongst others) the fields _VIOLATIONTYPE (int) and _DURATION (int). Now I would like to count the amount of documents that have a _VIOLATIONTYPE of 15 or less and a _DURATION of 10 or less. To do this I execute the following Python script:

#!/usr/bin/env python
import pymongo
import timeit


client = pymongo.MongoClient('localhost', 27017)
database = client['bgp_route_leaks']

collection = database['valleys']

collection.ensure_index('_VIOLATIONTYPE', unique=False)
collection.ensure_index('_DURATION', unique=False)

start = timeit.default_timer()

cursor = collection.find({'$and': [{'_VIOLATIONTYPE': {'$lt': 16}}, {'_DURATION': {'$lt': 10}}]}, {'_DURATION': 1, '_id': 0})

print('Explain: {}'.format(cursor.explain()))
print('Count: {}'.format(cursor.count()))
print('Time: {}'.format(timeit.default_timer() - start))

This prints out:

Explain: {u'nYields': 4, u'nscannedAllPlans': 6244545, u'allPlans': [{u'cursor': u'BtreeCursor _VIOLATIONTYPE_1', u'indexBounds': {u'_VIOLATIONTYPE': [[-1.7976931348623157e+308, 16]]}, u'nscannedObjects': 124, u'nscanned': 124, u'n': 34}, {u'cursor': u'BtreeCursor _DURATION_1', u'indexBounds': {u'_DURATION': [[-1.7976931348623157e+308, 10]]}, u'nscannedObjects': 6244298, u'nscanned': 6244298, u'n': 5678070}, {u'cursor': u'BasicCursor', u'indexBounds': {}, u'nscannedObjects': 123, u'nscanned': 123, u'n': 36}], u'millis': 30815, u'nChunkSkips': 0, u'server': u'area51:27017', u'n': 5678107, u'cursor': u'BtreeCursor _DURATION_1', u'scanAndOrder': False, u'indexBounds': {u'_DURATION': [[-1.7976931348623157e+308, 10]]}, u'nscannedObjectsAllPlans': 6244545, u'isMultiKey': False, u'indexOnly': True, u'nscanned': 6244298, u'nscannedObjects': 6244298}
Count: 5678107
Time: 52.4030768871

While running this I also executed db.currentOp() in another window, which returned

{
        "inprog" : [
                {
                        "opid" : 15,
                        "active" : true,
                        "secs_running" : 4,
                        "op" : "query",
                        "ns" : "bgp_route_leaks.valleys",
                        "query" : {
                                "$query" : {
                                        "$and" : [
                                                {
                                                        "_VIOLATIONTYPE" : {
                                                                "$lt" : 16
                                                        }
                                                },
                                                {
                                                        "_DURATION" : {
                                                                "$lt" : 10
                                                        }
                                                }
                                        ]
                                },
                                "$explain" : true
                        },
                        "client" : "127.0.0.1:46819",
                        "desc" : "conn1",
                        "threadId" : "0x7fd69b31c700",
                        "connectionId" : 1,
                        "locks" : {
                                "^" : "r",
                                "^bgp_route_leaks" : "R"
                        },
                        "waitingForLock" : false,
                        "numYields" : 5,
                        "lockStats" : {
                                "timeLockedMicros" : {
                                        "r" : NumberLong(8816104),
                                        "w" : NumberLong(0)
                                },
                                "timeAcquiringMicros" : {
                                        "r" : NumberLong(4408723),
                                        "w" : NumberLong(0)
                                }
                        }
                }
        ]
}

Now I have read that the most common source for slow MongoDB queries is missing indexes. However, I have ensured indexes for both _VIOLATIONTYPE and _DURATION and the explain tells me u'indexOnly': True. I also read that a NUMA architecture could slow things down and that I should start the service via the command

sudo numactl --interleave=all /usr/bin/mongod --dbpath=/var/lib/mongodb
(/proc/sys/vm/zone_reclaim_mode is already set to 0)

which I know have done, but it still takes about a minute for this count and even longer for others, so I was wondering what to do to make the query faster.

Running

db.runCommand({compact: 'bgp_route_leaks'})

in the mongo shell has also been tried with no luck.

Any suggestions on how to get the counts much faster?

The MongoDB version is 2.4.9.

If you look at your explain output you will see that the query that was using _VIOLATIONTYPE is scanning only 124 objects and the the query using _DURATION is scanning 6244545 objects.

Although MongoDB 2.6+ can use index intersection , compound index will always be faster.

You need to create a compound index on those fields:

collection.create_index([("_VIOLATIONTYPE", ASCENDING),("_DURATION", ASCENDING)]);

EDIT

In versions 2.4 MongoDB performance was significantly improved ( JIRA-1752 ).

Also, it's worth noticing that explain command is displaying details for a query not count command.

Unfortunately, you can't use explain on a count command, but there is a ticket opened for that issue.

To measure the performance of only the count command you should probably remove explain from your test. Also, you need to repeat the query multiple times (100x, 1000x...) and take an average to get an correct value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM