简体   繁体   中英

mongodb java driver converting large result set to json

I am running a query on ~120 million records in mongodb. The query executes quickly both through the mongo shell and using the Java drivers, however when I try to convert the result to a json object through the Java drivers, it is very slow (query takes < 100ms but the converting to json takes > 30s). The result set has about 5k items. I'm doing the conversion with JSON.serialize(cursor) .

While I expect it to take a little bit of time to convert to a json string, if I run the query from the shell but doing

var cursor = //execute query
var arr = cursor.toArray();
arr

It prints out very quickly.

The mongo server stats are reporting an increasing number of page faults during the serializing process, but I have increased my RAM to be much larger than the entire collection plus indexes.

Any thoughts on what might be happening here and how to improve the speed of the conversion to json?

The query did not execute when you got the cursor and even if it had, you would have only gotten a small fraction of the results.

Generally I would advise against loading all of the results into memory using toArray() or serializing to a string in memory. 50K documents is going to just take a lot of client side memory and it will not be very efficient at allocating that memory either.

If you are stuck with the 10gen Java driver then you will need to wait for JAVA-709 to get resolved for a streaming write capability. The Asynchronous Java Driver supports the ability to write to a stream.

If you can use an external program you might want to look at mongoexport . It can write out JSON to a file or stdout and should be close to optimal performance wise.

The page faults are normal for the first time the query runs. The second time, if the server has enough memory to keep the entire data set in memory, you should see very few page faults. If you are running the client on the same machine as the server it could be pushing the data out of memory to allocate the required memory in the client for the JSON blob.

HTH - Rob

It turns out it is taking just as long from the mongo shell. When I tested from the shell, the results must have been cached, so I thought I was seeing better results from the shell, but that is not true in my case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM