MongoDB聚合查询运行非常慢

Question

We version most of our collections in Mongodb. 我们在Mongodb中对大多数集合进行版本控制。 The selected versioning mechanism is as follows: 选定的版本控制机制如下：

{  "docId" : 174, "v" : 1,  "attr1": 165 }   /*version 1 */
{  "docId" : 174, "v" : 2,  "attr1": 165, "attr2": "A-1" } 
{  "docId" : 174, "v" : 3,  "attr1": 184, "attr2" : "A-1" }

So, when we perform our queries we always need to use the aggregation framework in this way to ensure get latest versions of our objects: 因此，当我们执行查询时，我们总是需要以这种方式使用聚合框架来确保获取对象的最新版本：

db.docs.aggregate( [  
    {"$sort":{"docId":-1,"v":-1}},
    {"$group":{"_id":"$docId","doc":{"$first":"$$ROOT"}}}
    {"$match":{<query>}}
] );

The problem with this approach is once you have done your grouping, you have a set of data in memory which has nothing to do with your collection and thus, your indexes cannot be used. 这种方法的问题是，一旦完成分组，内存中就会有一组与集合无关的数据，因此无法使用索引。

As a result, the more documents your collection has, the slower the query gets. 因此，您的收藏集拥有的文档越多，查询速度就越慢。

Is there any way to speed this up? 有什么办法可以加快速度吗？

If not, I will consider to move to one of the approaches defined in this good post: http://www.askasya.com/post/trackversions/ 如果没有，我将考虑转到此好帖子中定义的方法之一： http : //www.askasya.com/post/trackversions/

Answer 1

Just in order to complete this question, we went with option 3: one collection to keep latest versions and one collection to keep historical ones. 为了解决这个问题，我们选择了选项3：一个用于保留最新版本的集合，另一个用于保留历史版本的集合。 It is introduced here: http://www.askasya.com/post/trackversions/ and some further description (with some nice code snippets) can be found in http://www.askasya.com/post/revisitversions/ . 它在这里介绍： http : //www.askasya.com/post/trackversions/ ，一些进一步的描述（带有一些不错的代码片段）可以在http://www.askasya.com/post/revisitversions/中找到。

It has been running in production now for 6 months. 现在已经投入生产六个月了。 So far so good. 到现在为止还挺好。 Former approach meant we were always using the aggregate framework which moves away from indexes as soon as you modify the original schema (using $group, $project...) as it doesn't match anymore the original collection. 前一种方法意味着我们总是使用聚合框架，只要您修改了原始模式（使用$ group，$ project ...），该框架就会从索引移开，因为它不再与原始集合匹配。 This was making our performance terrible as the data was growing. 随着数据的增长，这使我们的性能变得糟糕。

With the new approach though the problem is gone. 使用新方法虽然问题不复存在。 90% of our queries goes against latest data and this means we target a collection with a simple ObjectId as identifier and we do not require aggregate framework anymore, just regular finds. 我们有90％的查询都针对最新数据，这意味着我们将目标定位为以简单的ObjectId作为标识符，并且我们不再需要聚合框架，只需常规查找即可。

Our queries against historical data always include id and version so by indexing these (we include both as _id so we get it out of the box), reads towards those collections are equally fast. 我们对历史数据的查询始终包含id和version因此通过对它们进行索引（我们将它们都包含为_id使它们开箱即用），对这些集合的读取也同样快。 This is a point though not to overlook. 这一点虽然不容忽视。 Read patterns in your application are crucial when designing how your collections/schemas should look like in MongoDB so you must ensure you know them when taking such decisions. 在设计集合/方案在MongoDB中的外观时，应用程序中的读取模式至关重要。因此，在做出此类决定时，必须确保您了解它们。

MongoDB聚合查询运行非常慢

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-01-21 11:47:36

MongoDB聚合查询运行非常慢

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-01-21 11:47:36

解决方案1
0 已采纳 2018-01-21 11:47:36