简体   繁体   中英

MongoDB concurrency - reduces the performance

I understand that mongo db does locking on read and write operations.

My Use case:

Only read operations. No write operations.

I have a collection about 10million documents. Storage engine is wiredTiger.

Mongo version is 3.4.

I made a request which should return 30k documents - took 650ms on an average.

When I made concurrent requests - same requests - 100 times - It takes in seconds - few seconds to 2 minutes all requests handled.

I have single node to serve the data.

How do I access the data:

Each document contains 25 to 40 fields. I indexed few fields. I query based on one index field.

API will return all the matching documents in json form.

Other informations: API is written using Spring boot.

Concurrency tested through JMeter shell script from command line on remote machine.

So,

My question:

  1. Am I missing any optimizations? [storage engine level, version]
  2. Can't I achieve all read requests to be served less than a second?
  3. If so, what sla I can keep for this use case?

Any suggestions?

Edit:

I enabled database profiler in mongodb with level 2.

My single query internally converted to 4 queries:

  1. Initial read
  2. getMore
  3. getMore
  4. getMore

These are the queries found through profiler.

Totally, it is taking less than 100ms. Is it true really?

My concurrent queries:

Now, When I hit 100 requests, nearly 150 operations are more than 100ms, 100 operations are more than 200ms, 90 operations are more than 300ms.

As per my single query analysis, 100 requests will be converted to 400 queries internally. It is fixed pattern which I verified by checking the query tag in the profiler output.

I hope this is what affects my request performance.

My single query internally converted to 4 queries:

  1. Initial read
  2. getMore
  3. getMore
  4. getMore

It's the way mongo cursors work. The documents are transferred from the db to the app in batches. IIRC the first batch is around 100 documents + cursor Id, then consecutive getMore calls retrieve next batches by cursor Id.

You can define batch size (number of documents in the batch) from the application. The batch cannot exceed 16MB, eg if you set batch size 30,000 it will fit into single batch only if document size is less than 500B.

Your investigation clearly show performance degradation under load. There are too many factors and I believe locking is not one of them. WiredTiger does exclusive locks on document level for regular write operations and you are doing only reads during your tests, aren't you? In any doubts you can compare results of db.serverStatus().locks before and after tests to see how many write locks were acquired. You can also run db.serverStatus().globalLock during the tests to check the queue. More details about locking and concurrency are there: https://docs.mongodb.com/manual/faq/concurrency/#for-wiredtiger

The bottleneck is likely somewhere else. There are few generic things to check:

  • Query optimisation. Ensure you use indexes. The profiler should have no "COLLSCAN" stage in execStats field.
  • System load. If your database shares system resources with application it may affect performance of the database. Eg BSON to JSON conversion in your API is quite CPU hungry and may affect performance of the queries. Check system's LA with top or htop on *nix systems.
  • Mongodb resources. Use mongostat and mongotop if the server has enough RAM, IO, file descriptors, connections etc.

If you cannot spot anything obvious I'd recommend you to seek professional help. I find the simplest way to get one is by exporting data to Atlas, running your tests against the cluster. Then you can talk to the support team if they could advice any improvements to the queries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM