简体   繁体   English

count()是MongoDB中的慢速操作,具体如何加速count()?

[英]count() is slow operation in MongoDB, how to speed up count() specifically?

I have a collection called ParseRequest. 我有一个名为ParseRequest的集合。 It is Shaded with sharing key _id. 使用共享密钥_id进行着色。 Probably not the best choice but right now I do not think it matters as the Collection only has 40,000 documents. 可能不是最好的选择,但现在我认为这并不重要,因为Collection只有40,000个文件。 There are two properties on ParseRequest collection that I am concerned with in this case: processed (Boolean) and parsed (Boolean). 在这种情况下,我关注ParseRequest集合上有两个属性:processed(布尔)和解析(布尔)。

I need to run this query and I want it to be lightning fast: 我需要运行此查询,我希望它快速闪电:

db. D b。 ParseRequest.count({processed: true, parsed: true}) ParseRequest.count({processed:true,parsed:true})

So I tried two different ways: 所以我尝试了两种不同的方式:

  1. Have separate composite key on processed and parsed 在处理和解析时使用单独的复合键
  2. Include processed and parsed into shard key 包括已处理和解析为分片键

Both ways improve performance but not enough, the count() above runs in 2-3 seconds or so, but I need much faster than that. 两种方式都提高了性能,但还不够,上面的count()运行时间在2-3秒左右,但我需要更快的速度。

What is noteworthy, this query returns in no time (few milliseconds): 值得注意的是,此查询立即返回(几毫秒):

db. D b。 ParseRequest.find({processed: true, parsed: true}).limit(5) ParseRequest.find({processed:true,parsed:true})。limit(5)

But

db. D b。 ParseRequest.count({processed: true, parsed: true}) ParseRequest.count({processed:true,parsed:true})

is still slow in either setup. 在任一设置中仍然很慢。

Is there anything else I should try? 还有什么我应该尝试的吗?

Departing from this specific example above, it looks like in general count() for a specific criteria is very expensive operation in MongoDB. 从上面的这个特定示例出发,看起来一般来说count()对于特定标准来说是MongoDB中非常昂贵的操作。 Even if you have an index it is still slow to do count; 即使你有一个索引,它仍然很慢; way slower than getting first few rows for the same criteria. 比获得相同标准的前几行要慢。 Is there any reason for that? 有什么理由吗?

I am coming from SQL Server background: calculating count(*) was always a fast thing in SQL Server. 我来自SQL Server背景:计算count(*)在SQL Server中总是很快。 Calculating count is important to my app, and my frustration with MongoDD has grown so much that I am considering dumping MongoDB for that reason alone: slow to calculate count() for a specific criteria. 计算计数对我的应用程序很重要,我对MongoDD的挫败感已经增长得太多了,我正在考虑单独抛出MongoDB:为特定标准计算count()的速度很慢。 But before I do that I want to be sure that I exhausted all possible ways to improve count calculation. 但在我这样做之前,我想确保我用尽所有可能的方法来改善计数计算。 Any suggestion is appreciated. 任何建议表示赞赏。 Thank you. 谢谢。

-=-=- - = - = -

Edit after firts few comments: 在几条评论之后编辑:

I use v2.2.6 running on Centos (64 bit) 我使用在Centos上运行的v2.2.6(64位)

Yes, explain says the idex is used and by the way without index it is even slower. 是的,解释说使用了idex,顺便说一下,没有索引它甚至更慢。

Yes I undestand that to calculate count() for a specific criteria the entire index tree needs to be scanned, but please excuse my comarison to SQL Server, in SQL Server it is also a complete index and yet somehow count(*) is faster everything else being equal. 是的我没有想到计算一个特定条件的count()需要扫描整个索引树,但请原谅我的comarison到SQL Server,在SQL Server中它也是一个完整的索引,但不知何故count(*)是更快的一切否则是平等的。 So is there any trick I can use in MongoDB? 那么我可以在MongoDB中使用任何技巧吗?

I would try upgrading to at least version 2.4.x. 我会尝试升级至至少2.4.x版。 There was a performance fix for count() that was released in 2.3.2. 在2.3.2中发布了count()的性能修复程序。 https://jira.mongodb.org/browse/SERVER-1752 https://jira.mongodb.org/browse/SERVER-1752

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM