简体   繁体   English

MongoDB聚合查询运行非常慢

[英]MongoDB Aggregation query running very slow

We version most of our collections in Mongodb. 我们在Mongodb中对大多数集合进行版本控制。 The selected versioning mechanism is as follows: 选定的版本控制机制如下:

{  "docId" : 174, "v" : 1,  "attr1": 165 }   /*version 1 */
{  "docId" : 174, "v" : 2,  "attr1": 165, "attr2": "A-1" } 
{  "docId" : 174, "v" : 3,  "attr1": 184, "attr2" : "A-1" }

So, when we perform our queries we always need to use the aggregation framework in this way to ensure get latest versions of our objects: 因此,当我们执行查询时,我们总是需要以这种方式使用聚合框架来确保获取对象的最新版本:

db.docs.aggregate( [  
    {"$sort":{"docId":-1,"v":-1}},
    {"$group":{"_id":"$docId","doc":{"$first":"$$ROOT"}}}
    {"$match":{<query>}}
] );

The problem with this approach is once you have done your grouping, you have a set of data in memory which has nothing to do with your collection and thus, your indexes cannot be used. 这种方法的问题是,一旦完成分组,内存中就会有一组与集合无关的数据,因此无法使用索引。

As a result, the more documents your collection has, the slower the query gets. 因此,您的收藏集拥有的文档越多,查询速度就越慢。

Is there any way to speed this up? 有什么办法可以加快速度吗?

If not, I will consider to move to one of the approaches defined in this good post: http://www.askasya.com/post/trackversions/ 如果没有,我将考虑转到此好帖子中定义的方法之一: http : //www.askasya.com/post/trackversions/

Just in order to complete this question, we went with option 3: one collection to keep latest versions and one collection to keep historical ones. 为了解决这个问题,我们选择了选项3:一个用于保留最新版本的集合,另一个用于保留历史版本的集合。 It is introduced here: http://www.askasya.com/post/trackversions/ and some further description (with some nice code snippets) can be found in http://www.askasya.com/post/revisitversions/ . 它在这里介绍: http : //www.askasya.com/post/trackversions/ ,一些进一步的描述(带有一些不错的代码片段)可以在http://www.askasya.com/post/revisitversions/找到

It has been running in production now for 6 months. 现在已经投入生产六个月了。 So far so good. 到现在为止还挺好。 Former approach meant we were always using the aggregate framework which moves away from indexes as soon as you modify the original schema (using $group, $project...) as it doesn't match anymore the original collection. 前一种方法意味着我们总是使用聚合框架,只要您修改了原始模式(使用$ group,$ project ...),该框架就会从索引移开,因为它不再与原始集合匹配。 This was making our performance terrible as the data was growing. 随着数据的增长,这使我们的性能变得糟糕。

With the new approach though the problem is gone. 使用新方法虽然问题不复存在。 90% of our queries goes against latest data and this means we target a collection with a simple ObjectId as identifier and we do not require aggregate framework anymore, just regular finds. 我们有90%的查询都针对最新数据,这意味着我们将目标定位为以简单的ObjectId作为标识符,并且我们不再需要聚合框架,只需常规查找即可。

Our queries against historical data always include id and version so by indexing these (we include both as _id so we get it out of the box), reads towards those collections are equally fast. 我们对历史数据的查询始终包含idversion因此通过对它们进行索引(我们将它们都包含为_id使它们开箱即用),对这些集合的读取也同样快。 This is a point though not to overlook. 这一点虽然不容忽视。 Read patterns in your application are crucial when designing how your collections/schemas should look like in MongoDB so you must ensure you know them when taking such decisions. 在设计集合/方案在MongoDB中的外观时,应用程序中的读取模式至关重要。因此,在做出此类决定时,必须确保您了解它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM