简体繁体 English

count（）是MongoDB中的慢速操作，具体如何加速count（）？

[英]count() is slow operation in MongoDB, how to speed up count() specifically?

原文 2014-04-10 16:43:57 3 1 mongodb

I have a collection called ParseRequest. 我有一个名为ParseRequest的集合。 It is Shaded with sharing key _id. 使用共享密钥_id进行着色。 Probably not the best choice but right now I do not think it matters as the Collection only has 40,000 documents. 可能不是最好的选择，但现在我认为这并不重要，因为Collection只有40,000个文件。 There are two properties on ParseRequest collection that I am concerned with in this case: processed (Boolean) and parsed (Boolean). 在这种情况下，我关注ParseRequest集合上有两个属性：processed（布尔）和解析（布尔）。

I need to run this query and I want it to be lightning fast: 我需要运行此查询，我希望它快速闪电：

db. D b。 ParseRequest.count({processed: true, parsed: true}) ParseRequest.count（{processed：true，parsed：true}）

So I tried two different ways: 所以我尝试了两种不同的方式：

Have separate composite key on processed and parsed 在处理和解析时使用单独的复合键
Include processed and parsed into shard key 包括已处理和解析为分片键

Both ways improve performance but not enough, the count() above runs in 2-3 seconds or so, but I need much faster than that. 两种方式都提高了性能，但还不够，上面的count（）运行时间在2-3秒左右，但我需要更快的速度。

What is noteworthy, this query returns in no time (few milliseconds): 值得注意的是，此查询立即返回（几毫秒）：

db. D b。 ParseRequest.find({processed: true, parsed: true}).limit(5) ParseRequest.find（{processed：true，parsed：true}）。limit（5）

But 但

db. D b。 ParseRequest.count({processed: true, parsed: true}) ParseRequest.count（{processed：true，parsed：true}）

is still slow in either setup. 在任一设置中仍然很慢。

Is there anything else I should try? 还有什么我应该尝试的吗？

Departing from this specific example above, it looks like in general count() for a specific criteria is very expensive operation in MongoDB. 从上面的这个特定示例出发，看起来一般来说count（）对于特定标准来说是MongoDB中非常昂贵的操作。 Even if you have an index it is still slow to do count; 即使你有一个索引，它仍然很慢; way slower than getting first few rows for the same criteria. 比获得相同标准的前几行要慢。 Is there any reason for that? 有什么理由吗？

I am coming from SQL Server background: calculating count(*) was always a fast thing in SQL Server. 我来自SQL Server背景：计算count（*）在SQL Server中总是很快。 Calculating count is important to my app, and my frustration with MongoDD has grown so much that I am considering dumping MongoDB for that reason alone: slow to calculate count() for a specific criteria. 计算计数对我的应用程序很重要，我对MongoDD的挫败感已经增长得太多了，我正在考虑单独抛出MongoDB：为特定标准计算count（）的速度很慢。 But before I do that I want to be sure that I exhausted all possible ways to improve count calculation. 但在我这样做之前，我想确保我用尽所有可能的方法来改善计数计算。 Any suggestion is appreciated. 任何建议表示赞赏。 Thank you. 谢谢。

-=-=- - = - = -

Edit after firts few comments: 在几条评论之后编辑：

I use v2.2.6 running on Centos (64 bit) 我使用在Centos上运行的v2.2.6（64位）

Yes, explain says the idex is used and by the way without index it is even slower. 是的，解释说使用了idex，顺便说一下，没有索引它甚至更慢。

Yes I undestand that to calculate count() for a specific criteria the entire index tree needs to be scanned, but please excuse my comarison to SQL Server, in SQL Server it is also a complete index and yet somehow count(*) is faster everything else being equal. 是的我没有想到计算一个特定条件的count（）需要扫描整个索引树，但请原谅我的comarison到SQL Server，在SQL Server中它也是一个完整的索引，但不知何故count（*）是更快的一切否则是平等的。 So is there any trick I can use in MongoDB? 那么我可以在MongoDB中使用任何技巧吗？

1 个解决方案

I would try upgrading to at least version 2.4.x. 我会尝试升级至至少2.4.x版。 There was a performance fix for count() that was released in 2.3.2. 在2.3.2中发布了count（）的性能修复程序。 https://jira.mongodb.org/browse/SERVER-1752 https://jira.mongodb.org/browse/SERVER-1752