I have a database with a collection in it with a large amount of documents (several millions). In this database I have (amongst others) the fields _VIOLATIONTYPE (int) and _DURATION (int). Now I would like to count the amount of documents that have a _VIOLATIONTYPE of 15 or less and a _DURATION of 10 or less. To do this I execute the following Python script:
#!/usr/bin/env python
import pymongo
import timeit
client = pymongo.MongoClient('localhost', 27017)
database = client['bgp_route_leaks']
collection = database['valleys']
collection.ensure_index('_VIOLATIONTYPE', unique=False)
collection.ensure_index('_DURATION', unique=False)
start = timeit.default_timer()
cursor = collection.find({'$and': [{'_VIOLATIONTYPE': {'$lt': 16}}, {'_DURATION': {'$lt': 10}}]}, {'_DURATION': 1, '_id': 0})
print('Explain: {}'.format(cursor.explain()))
print('Count: {}'.format(cursor.count()))
print('Time: {}'.format(timeit.default_timer() - start))
This prints out:
Explain: {u'nYields': 4, u'nscannedAllPlans': 6244545, u'allPlans': [{u'cursor': u'BtreeCursor _VIOLATIONTYPE_1', u'indexBounds': {u'_VIOLATIONTYPE': [[-1.7976931348623157e+308, 16]]}, u'nscannedObjects': 124, u'nscanned': 124, u'n': 34}, {u'cursor': u'BtreeCursor _DURATION_1', u'indexBounds': {u'_DURATION': [[-1.7976931348623157e+308, 10]]}, u'nscannedObjects': 6244298, u'nscanned': 6244298, u'n': 5678070}, {u'cursor': u'BasicCursor', u'indexBounds': {}, u'nscannedObjects': 123, u'nscanned': 123, u'n': 36}], u'millis': 30815, u'nChunkSkips': 0, u'server': u'area51:27017', u'n': 5678107, u'cursor': u'BtreeCursor _DURATION_1', u'scanAndOrder': False, u'indexBounds': {u'_DURATION': [[-1.7976931348623157e+308, 10]]}, u'nscannedObjectsAllPlans': 6244545, u'isMultiKey': False, u'indexOnly': True, u'nscanned': 6244298, u'nscannedObjects': 6244298}
Count: 5678107
Time: 52.4030768871
While running this I also executed db.currentOp() in another window, which returned
{
"inprog" : [
{
"opid" : 15,
"active" : true,
"secs_running" : 4,
"op" : "query",
"ns" : "bgp_route_leaks.valleys",
"query" : {
"$query" : {
"$and" : [
{
"_VIOLATIONTYPE" : {
"$lt" : 16
}
},
{
"_DURATION" : {
"$lt" : 10
}
}
]
},
"$explain" : true
},
"client" : "127.0.0.1:46819",
"desc" : "conn1",
"threadId" : "0x7fd69b31c700",
"connectionId" : 1,
"locks" : {
"^" : "r",
"^bgp_route_leaks" : "R"
},
"waitingForLock" : false,
"numYields" : 5,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(8816104),
"w" : NumberLong(0)
},
"timeAcquiringMicros" : {
"r" : NumberLong(4408723),
"w" : NumberLong(0)
}
}
}
]
}
Now I have read that the most common source for slow MongoDB queries is missing indexes. However, I have ensured indexes for both _VIOLATIONTYPE and _DURATION and the explain tells me u'indexOnly': True. I also read that a NUMA architecture could slow things down and that I should start the service via the command
sudo numactl --interleave=all /usr/bin/mongod --dbpath=/var/lib/mongodb
(/proc/sys/vm/zone_reclaim_mode is already set to 0)
which I know have done, but it still takes about a minute for this count and even longer for others, so I was wondering what to do to make the query faster.
Running
db.runCommand({compact: 'bgp_route_leaks'})
in the mongo shell has also been tried with no luck.
Any suggestions on how to get the counts much faster?
The MongoDB version is 2.4.9.
If you look at your explain
output you will see that the query that was using _VIOLATIONTYPE
is scanning only 124 objects and the the query using _DURATION
is scanning 6244545 objects.
Although MongoDB 2.6+ can use index intersection , compound index will always be faster.
You need to create a compound index on those fields:
collection.create_index([("_VIOLATIONTYPE", ASCENDING),("_DURATION", ASCENDING)]);
EDIT
In versions 2.4 MongoDB performance was significantly improved ( JIRA-1752 ).
Also, it's worth noticing that explain
command is displaying details for a query not count command.
Unfortunately, you can't use explain
on a count
command, but there is a ticket opened for that issue.
To measure the performance of only the count
command you should probably remove explain
from your test. Also, you need to repeat the query multiple times (100x, 1000x...) and take an average to get an correct value.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.