简体   繁体   English

MongoDB:慢查询,即使是索引

[英]MongoDB: Slow query, even with index

I have a webpage, which uses MongoDB for storing and retrieving various measurements. 我有一个网页,它使用MongoDB存储和检索各种测量。 Suddenly, in some point, my webpage became so sluggish it became unusable. 突然间,在某些时候,我的网页变得如此迟钝,变得无法使用。 It turns out, my database is the culprit. 事实证明,我的数据库是罪魁祸首。

I searched for and have not found any solution for my problem, and I apologize, as I am pretty new to MongoDB and pulling my hair out at the moment. 我搜索并没有找到任何解决我的问题的方法,我道歉,因为我对MongoDB很新,并且此刻拉了我的头发。

Version of MongoDB I am using is 2.4.6, on VM Machine with 20GB RAM, which runs Ubuntu server 12.04. 我正在使用的MongoDB版本是2.4.6,在具有20GB RAM的VM Machine上运行,它运行Ubuntu服务器12.04。 There is no replica or sharding set up. 没有复制或分片设置。

Firstly, I set my profiling level to 2 and it revealed the slowest query: 首先,我将我的分析级别设置为2,它显示了最慢的查询:

db.system.profile.find().sort({"millis":-1}).limit(1).pretty()
{
        "op" : "query",
        "ns" : "station.measurement",
        "query" : {
                "$query" : {
                        "e" : {
                                "$gte" : 0
                        },
                        "id" : "180"
                },
                "$orderby" : {
                        "t" : -1
                }
        },
        "ntoreturn" : 1,
        "ntoskip" : 0,
        "nscanned" : 3295221,
        "keyUpdates" : 0,
        "numYield" : 6,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(12184722),
                        "w" : NumberLong(0)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(5636351),
                        "w" : NumberLong(5)
                }
        },
        "nreturned" : 0,
        "responseLength" : 20,
        "millis" : 6549,
        "ts" : ISODate("2015-03-16T08:57:07.772Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}

I ran that specific query with .explain() and looks like, it uses index as it should, but it takes too long. 我使用.explain()运行该特定查询,看起来像它应该使用索引,但它需要太长时间。 I also ran that same query on my another, drastically weaker server and sput out the results like a champ in a second. 我也在我的另一个服务器上运行了相同的查询,这个服务器在一秒钟内就像一个冠军一样抽出结果。

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 660385,
        "nscannedObjectsAllPlans" : 1981098,
        "nscannedAllPlans" : 3301849,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 7,
        "nChunkSkips" : 0,
        "millis" : 7243,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

Next, I looked into indexes of measurement collection and it looked fine to me: 接下来,我查看了测量集合的索引,它对我来说很好看:

> db.measurement.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "station.measurement",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "t" : 1
                },
                "ns" : "station.measurement",
                "name" : "t_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "d" : 1,
                        "_id" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_d_1__id_-1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : 1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_-1"
        }
]

Here is also the rest of information of my collection: 这里还有我收藏的其他信息:

> db.measurement.stats()
{
        "ns" : "station.measurement",
        "count" : 157835456,
        "size" : 22377799512,
        "avgObjSize" : 141.77929395027692,
        "storageSize" : 26476834672,
        "numExtents" : 33,
        "nindexes" : 5,
        "lastExtentSize" : 2146426864,
        "paddingFactor" : 1.0000000000028617,
        "systemFlags" : 0,
        "userFlags" : 0,
        "totalIndexSize" : 30996614096,
        "indexSizes" : {
                "_id_" : 6104250656,
                "t_1" : 3971369360,
                "id_1_d_1__id_-1" : 8397896640,
                "id_1_t_-1_e_1" : 6261548720,
                "id_1_t_-1_e_-1" : 6261548720
        },
        "ok" : 1
}

I tried adding new index, repairing whole database, reindex. 我尝试添加新索引,修复整个数据库,重新索引。 What am I doing wrong? 我究竟做错了什么? I really appreciate any help as I desperately ran out of ideas. 我真的很感激任何帮助,因为我拼命想法。

UPDATE 1: 更新1:

I added two indexes as suggested by Neil Lunn, some of the queries are a LOT faster: 我按照Neil Lunn的建议添加了两个索引,其中一些查询速度更快:

{
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : 1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_1_t_-1",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : -1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_-1_t_-1",
                "background" : true
        }

Results I've got are interesting (not sure though they are relevant) 我得到的结果很有趣(不确定它们是否相关)

Next two queries differs by "id" only. 接下来的两个查询仅与“id”不同。 Please notice, each query uses different index, why? 请注意,每个查询使用不同的索引,为什么? Should I delete older ones? 我应该删除较旧的吗?

> db.measurement.find({"id":"119", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 840747,
        "nscannedObjects" : 840747,
        "nscanned" : 1047044,
        "nscannedObjectsAllPlans" : 1056722,
        "nscannedAllPlans" : 1311344,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 4,
        "nChunkSkips" : 0,
        "millis" : 3730,
        "indexBounds" : {
                "id" : [
                        [
                                "119",
                                "119"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_e_1_t_-1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 0,
        "nscannedObjectsAllPlans" : 0,
        "nscannedAllPlans" : 45,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ]
        },
        "server" : "station:27017"
}

Could the problem be somewhere else? 问题可能在别的地方吗? What could cause that sudden "sluggishness"? 什么可能导致突然“缓慢”? I have several other collections, where queries are suddenly slower also. 我还有其他几个集合,查询突然变慢了。

Oh, and another thing. 哦,还有一件事。 On that other server I have, indexes are the same as here before I added new ones. 在我拥有的其他服务器上,索引与添加新索引之前的索引相同。 Yes, collection is a bit smaller but it is several times faster. 是的,收集有点小,但速度要快几倍。

Then point here was in both the index and query ordering selections. 然后点这里是索引和查询排序选择。

If you look at your earlier output from .explain() you will see that that there is a "min/max" range on the "t" element in your expression. 如果你查看.explain()早期输出,你会发现表达式中“t”元素的“min / max”范围。 By "moving that to the end" of the evaluation, you allow other filtering elements that are more important to the overall expression ( determine less possible matches of "e" to be the main factor before scanning though "t" in basically "everything". 通过“将其移动到评估结束”,您允许对整体表达更重要的其他过滤元素(确定较少可能的“e”匹配是在扫描之前的主要因素,而“t”基本上是“所有” 。

It's a little bit DBA, but in the NoSQL world I do believe this becomes a programmer problem. 这有点像DBA,但在NoSQL世界中我确实认为这会成为程序员的问题。

You essentially need to construct your "shortest match path" along the selected keys in order to get the most effective scan. 您基本上需要沿选定的键构建“最短匹配路径”,以便获得最有效的扫描。 That is why the altered results executes much faster. 这就是改变后的结果执行得更快的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM