Mongodb Performance issue

Question

I am using mongodb and I need to update my documents say total 1000 are there. My document has a basic structure like:

{
People:[1,2,3,4],
Place:"Auckland"
Event:"Music Show"
}

I have 10,000 threads running concurrently in another VM. Each thread looks for these documents(1000), see if these 1000 documents matches the query and push a number in People array . Suppose if thread 100 found say 500 out of these 1000 documents relevant, then it pushes the number 100 in People array of all the 500 documents. For this,

I am using for each thread(10000) the command

update.append("$push",new BasicDBObject("People",serial_number));
updateMulti(query,update);

I observe poor performance for these in-place updates (multi-query). Is this a problem due to a write lock? Every thread(10000) updates the document that is relevant to the query ? - so there seems to be a lot of "waiting" Is there a more efficient way to do these "push" operations? Is "UpdateMulti" the right way to approach this?

Th‎ank you for a great response - Editing and Adding some more information

Some design background :

Yes your reading of our problem is correct. We have 10000 threads each representing one "actor" updating upto 1000 entities ( based on the appropriate query ) at a time with a $push .

Inverting the model leads us to a few broken usecases ( from our domain perspective ) leading us to joins across "states" of the primary entity ( which will now be spread across many collections ) - ex: each of these actions is a state change for that entity - E has states ( e1,e2,e3,e4,e5 ) - So e1 to e5 is represented as an aggregate array which gets updated by the 10,000 threads/processes which represent actions of external apps.

We need close to real-time aggregation as another set of "actors" look at these "states" of e1 to e5 and then respond appropriately via another channel to the "elements in the array".

What should be the "ideal" design strategy in such a case - to speed up the writes. Will sharding help - is there a "magnitude" heuristic for this - at what lock% should we shard etc..

Answer 1

This is a problem because of your schema design.

It is extremely inefficient to $push multiple values to multiple documents, especially from multiple threads. It's not so much that the write lock is the problem, it's that your design made it the problem. In addition, you are continuously growing documents which means that the updates are not "in place" and your collection is quickly getting fragmented.

It seems like your schema is "upside down". You have 10,000 threads looking to add numbers representing people (I assume a very large number of people) to a small number of documents (1000) which will grow to be huge. It seems to me that if you want to embed something in something else, you might consider collections representing people and then embedding events that those people are found at - at least then you are limiting the size of the array for each person to 1,000 at most, and the updates will be spread across a much larger number of documents, reducing contention significantly.

Another option is simply to record the event/person in attendance and then do aggregation over the raw data later, but without knowing exactly what your requirements for this application are, it's hard to know which way will produce the best results - the way you have picked is definitely one that's unlikely to give you good performance.

Mongodb Performance issue

Question

1 answers

solution1
1 2014-01-08 09:15:41

Mongodb Performance issue

Question

1 answers

solution1 1 2014-01-08 09:15:41

solution1
1 2014-01-08 09:15:41