I am trying to benchmark mongodb performance and I am having problems understanding how mongodb executes queries, specifically how long they take to complete.
If I run the following code:
import pymongo
#Connect to the database
client = MongoClient("mongodb://.../testrecords")
db = client.testrecords
start = datetime.datetime.now()
result = db.threads.find( {"$and": [{ "location" : "JC018" }, {"timestamp": "2018-03-22T23:05:15+00:00"} ] } ).explain()
endtime = datetime.datetime.now()
print ("duration: " + str(endtime-start))
print(result)
I receive following output: duration: 0:00:00.531754
. I also get the results of the explanation()
function providing the following information executionTimeMillis': 249
This makes sense as the time taken by mongodb to execute the query is less than the roundtrip time.
However if I use the following loop to run the same query 10,000 times, the execution duration is consistently recorded as between 200 and 300 milliseconds. (Note that I have removed the explain()
call.) I fail to see how running the query 10,000 times can result in no meaningful increase in execution time.
for i in range(10000):
result = db.threads.find( {"$and": [{ "location" : "JC018" }, {"timestamp": "2018-03-22T23:05:15+00:00"} ] } )
However, if I run the loop with the explain()
function it does appear to take approximately n * 250ms to execute the loop.
for i in range(n):
result = db.threads.find( {"$and": [{ "location" : "JC018" }, {"timestamp": "2018-03-22T23:05:15+00:00"} ] } )
Can anyone explain the lack of a time difference in executing the query once and executing it 10,000 times and why adding the explain() function to the loop appears to result in the expected execution time?
I thought that there may be some kind of caching going on but I am only using PyMongo on the client side and cannot find any mention of this in the documentation.
Thanks
So after more research I discovered that the query doesn't return all of the results in the database, it returns the first 100 records and a Cursor
object which is a reference to the result set and can be iterated over.
So to actually fetch all of the results from the database one would use the following code:
results = []
for doc in db.threads.find( { "timestamp": { "$gt": "2018-02-20T20:08:00+00:00", "$lt": "2018-02-20T22:54:42.3+00:00"} } ):
results.append(doc)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.