简体   繁体   English

Mongodb性能问题

[英]Mongodb Performance issue

I am using mongodb and I need to update my documents say total 1000 are there. 我正在使用mongodb,我需要更新我的文档,说总共有1000个。 My document has a basic structure like: 我的文档具有如下基本结构:

{
People:[1,2,3,4],
Place:"Auckland"
Event:"Music Show"
}

I have 10,000 threads running concurrently in another VM. 我有10,000个线程在另一个VM中同时运行。 Each thread looks for these documents(1000), see if these 1000 documents matches the query and push a number in People array . 每个线程都会查找这些文档(1000),查看这1000个文档是否与查询匹配,并在People数组中推送一个数字。 Suppose if thread 100 found say 500 out of these 1000 documents relevant, then it pushes the number 100 in People array of all the 500 documents. 假设如果找到的线程100说这1000个相关文档中有500个,那么它将在所有500个文档的People数组中推送数字100。 For this, 为了这,

I am using for each thread(10000) the command 我正在为每个线程(10000)使用命令

update.append("$push",new BasicDBObject("People",serial_number));
updateMulti(query,update);

I observe poor performance for these in-place updates (multi-query). 我发现这些就地更新(多查询)的性能很差。 Is this a problem due to a write lock? 这是归因于写锁定的问题吗? Every thread(10000) updates the document that is relevant to the query ? 每个线程(10000)更新与查询相关的文档吗? - so there seems to be a lot of "waiting" Is there a more efficient way to do these "push" operations? -因此似乎有很多“等待”吗?有没有更有效的方法来执行这些“推送”操作? Is "UpdateMulti" the right way to approach this? “ UpdateMulti”是解决此问题的正确方法吗?

Th‎ank you for a great response - Editing and Adding some more information 谢谢您的回应-编辑和添加更多信息

Some design background : 一些设计背景:

Yes your reading of our problem is correct. 是的,您对我们问题的理解是正确的。 We have 10000 threads each representing one "actor" updating upto 1000 entities ( based on the appropriate query ) at a time with a $push . 我们有10000个线程,每个线程代表一个“角色”,使用$ push一次最多更新1000个实体(基于适当的查询)。

Inverting the model leads us to a few broken usecases ( from our domain perspective ) leading us to joins across "states" of the primary entity ( which will now be spread across many collections ) - ex: each of these actions is a state change for that entity - E has states ( e1,e2,e3,e4,e5 ) - So e1 to e5 is represented as an aggregate array which gets updated by the 10,000 threads/processes which represent actions of external apps. 反转模型会导致我们遇到一些破碎的用例(从我们的领域角度来看),从而导致我们跨主要实体的“状态”(现在将分布在多个集合中)联接-例如:这些动作中的每个动作都是针对该实体-E具有状态(e1,e2,e3,e4,e5)-因此e1到e5表示为一个汇总数组,该数组通过10,000个代表外部应用程序动作的线程/进程进行更新。

We need close to real-time aggregation as another set of "actors" look at these "states" of e1 to e5 and then respond appropriately via another channel to the "elements in the array". 我们需要接近实时的聚合,因为另一组“参与者”会查看e1到e5的这些“状态”,然后通过另一个通道对“数组中的元素”进行适当的响应。

What should be the "ideal" design strategy in such a case - to speed up the writes. 在这种情况下,什么是“理想”的设计策略-加快写入速度。 Will sharding help - is there a "magnitude" heuristic for this - at what lock% should we shard etc.. 会分片帮助-是否对此进行“量级”试探-我们应该分片锁定的百分比等。

This is a problem because of your schema design. 由于您的架构设计,这是一个问题。

It is extremely inefficient to $push multiple values to multiple documents, especially from multiple threads. $push多个值(特别是来自多个线程)的$push送到多个文档的效率极低。 It's not so much that the write lock is the problem, it's that your design made it the problem. 问题不仅仅在于写锁定,还在于您的设计使它成为了问题。 In addition, you are continuously growing documents which means that the updates are not "in place" and your collection is quickly getting fragmented. 此外,您还在不断增加文档,这意味着更新“不到位”,并且您的收藏很快就变得支离破碎。

It seems like your schema is "upside down". 看来您的架构是“颠倒的”。 You have 10,000 threads looking to add numbers representing people (I assume a very large number of people) to a small number of documents (1000) which will grow to be huge. 您有10,000个线程,希望将代表人员的数字(我假设有很多人)添加到少量文档(1000个)中,该文档将变得庞大。 It seems to me that if you want to embed something in something else, you might consider collections representing people and then embedding events that those people are found at - at least then you are limiting the size of the array for each person to 1,000 at most, and the updates will be spread across a much larger number of documents, reducing contention significantly. 在我看来,如果您想将某些内容嵌入其他内容,则可以考虑代表人的集合,然后嵌入发现这些人的事件-至少您将每个人的数组大小限制为最多1000个,更新将散布在大量文档中,从而大大减少了争用。

Another option is simply to record the event/person in attendance and then do aggregation over the raw data later, but without knowing exactly what your requirements for this application are, it's hard to know which way will produce the best results - the way you have picked is definitely one that's unlikely to give you good performance. 另一个选择是简单地记录参加会议的事件/人员,然后在以后对原始数据进行汇总,但是在不确切知道您对该应用程序有什么要求的情况下,很难知道哪种方法会产生最佳结果-您拥有的方式选择绝对是不可能给您带来好的表现的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM