简体   繁体   English

当$ match键不在索引中时,mongodb聚合非常慢(tablescan吗?)

[英]mongodb aggregation very slow when $match key not in index (does tablescan?)

db.collection.aggregate(

{  "$match" : {
        "key" : "mykey"

    }
}, 
{
    "$sort" : {
        "time" : -1
    }
},
{
    "$limit" : 1 
}

)

example document: 示例文件:

{
   key: "key1",
   time: ISODate("2014-07-04T20:04:46.904Z")
}

indexes 指标

"time" : -1
"key" : 1,
"_id" : 1

when "mykey" exists in the collection the query takes 30ms, when "mykey" does not exist it takes 10s, explain tells me indexes are used. 当集合中存在“ mykey”时,查询需要30毫秒;如果不存在“ mykey”,则需要10秒钟,解释告诉我使用了索引。 This is a capped collection, therefor it usually occurs that "keys" are missing. 这是一个有上限的集合,因此通常会发生“键”丢失的情况。 Why does it take that long. 为什么要花这么长时间。 btw. 顺便说一句 Mongodb 2.4 Mongodb的2.4

further exploration: 进一步探索:

removing the index for the sort reduces the lookup time: 删除排序索引会减少查找时间:

explain for aggregate with and without index on the sort field shows that with index the sort gets executed at the start of the pipeline, without index on sort it gets executed as last step of the pipeline 对于在排序字段上具有索引和不具有索引的聚合的说明说明,具有索引的排序在管道的开头执行,而没有索引的排序则在管道的最后一步执行

Your query is equality on key and sort on time which means that you are using the wrong index for this (your index is on time:1, key:1 in essence). 您的查询在keytime上是相等的,这意味着您为此使用了错误的索引(本质上您的索引是在time:1, key:1 )。

The order of fields for the query you are running should be key:1, time:1 (as first two fields) in order to have effective help from it. 您正在运行的查询的字段顺序应为key:1, time:1 (作为前两个字段),以便获得有效的帮助。 With that index, the matched key value can be jumped to directly, and then if there are multiple time values for that key then they are sorted and the highest one can be immediately fetched. 使用该索引,可以将匹配的key直接跳转到,然后如果该key有多个time值,则将它们排序并立即获取最高的值。 If key is not found in the index then you're done. 如果在索引中找不到key ,那么您就完成了。

As it is, the query is forced to scan all time values in Index (leading field) so that when you find the first matching key you'll be able to return. 实际上,查询被强制扫描索引(前导字段)中的所有时间值,以便当您找到第一个匹配键时就可以返回。 When key you are looking for doesn't exist, the query ends up scanning through the entire index before it can return. 当您要查找的键不存在时,查询最终将扫描整个索引,然后才能返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM