简体   繁体   中英

Why does MongoDB *client* use more memory than the server in this case?

I'm evaluating MongoDB. I have a small 20GB subset of documents. Each is essentially a request log for a social game along with some captured state of the game the user was playing at that moment.

I thought I'd try finding game cheaters. So I wrote a function that runs server side. It calls find() on an indexed collection and sorts according to the existing index. Using a cursor it goes through all documents in indexed order. The index is {user_id,time}. So I'm going through each user's history, checking if certain values (money/health/etc) increase faster than is possible in the game. The script returns the first violation found. It does not collect violations.

The ONLY thing that this script does on the client is define the function and calls mymongodb.eval(myscript) on a mongod instance on another box.

The box that mongod is running on does fine. The one that the script is launched from starts losing memory and swap. Hours later: 8GB of RAM and 6GB of swap are being used on the client machine that did nothing more than launch a script on another box and wait for a return value.

Is the mongo client really that flakey? Have I done something wrong or made an incorrect assumption about mongo/mongod?

If you just want to open up a client connection to a remote database you should use the mongo command, not mongod . mongod starts up a server on your local machine. Not sure what specifying a url will do.

Try

mongo remotehost:27017

From the documentation :

Use map/reduce instead of db.eval() for long running jobs. db.eval blocks other operations!

eval is a function that blocks the entire server if you don't use a special flag. Again, from the docs:

If you don't use the "nolock" flag, db.eval() blocks the entire mongod process while running [...]

You are kind of abusing MongoDB here. Your current routine is strange, because it returns the first violation found, but it will have to re-check everything when run the next time (unless your user ids are ordered and you store the last evaluated user id).

Map/Reduce generally is the better option for a long-running task, but aggregating your data does not seem trivial. However, a map/reduce based solution would also solve the re-evaluation problem.

I'd probably return something like this from map/reduce:

user id -> suspicious actions, e.g.
------
2525454 -> [{logId: 235345435, t: ISODate("...")}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM