简体   繁体   English

虽然设置了索引,但简单的MongoDB查询非常慢

[英]Simple MongoDB query very slow although index is set

I've got a MongoDB collection that holds about 100M documents. 我有一个包含大约100M文档的MongoDB集合。

The documents basically look like this: 文件基本上如下:

_id             : ObjectId("asd1234567890")
_reference_1_id : ObjectId("fgh4567890123")
_reference_2_id : ObjectId("jkl7890123456")
name            : "Test1"
id              : "4815162342"
created_time    : Date( 1331882436000 )
_contexts       : ["context1", "context2"]
...

There are some indexes set, here's the output of db.mycoll.getIndexes(); 设置了一些索引,这里是db.mycoll.getIndexes()的输出;

[
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "ns" : "mydb.mycoll",
    "name" : "_id_"
},
{
    "v" : 1,
    "key" : {
        "_reference_1_id" : 1,
        "_reference_2_id" : 1,
        "id" : 1
    },
    "unique" : true,
    "ns" : "mydb.mycoll",
    "name" : "_reference_1_id_1__reference_2_id_1_id_1"
},
{
    "v" : 1,
    "key" : {
        "_reference_1_id" : 1,
        "_reference_2_id" : 1,
        "_contexts" : 1,
        "created_time" : 1
    },
    "ns" : "mydb.mycoll",
    "name" : "_reference_1_id_1__reference_2_id_1__contexts_1_created_time_1"
}
]

When I execute a query like 当我执行像这样的查询时

db.mycoll.find({"_reference_2_id" : ObjectId("jkl7890123456")})

it takes over an hour (!) until it's finished, no matter if there are results or not. 无论是否有结果,它都需要一个多小时(!)才能完成。 Any ideas? 有任何想法吗?

Update: Here's what the output of 更新:这是输出的内容

db.mycoll.find({"_reference_2_id" : ObjectId("jkl7890123456")}).explain();

looks like: 看起来像:

{
"cursor" : "BasicCursor",
"nscanned" : 99209163,
"nscannedObjects" : 99209163,
"n" : 5007,
"millis" : 5705175,
"nYields" : 17389,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

You don't have any index that mongo will automatically use for that, so it's doing a full table scan. 你没有任何mongo会自动使用的索引,所以它正在进行全表扫描。

As mentioned in the docs 文档所述

If the first key [of the index] is not present in the query, the index will only be used if hinted explicitly. 如果查询中不存在[索引]的第一个键,则仅在显式提示时才使用索引。

Why 为什么

If you have an index on a,b - and you search by a alone - an index will automatically be used. 如果在A,B有一个索引-你通过搜索a单独-索引将自动使用。 This is because it's the start of the index (which is fast to do), the db can just ignore the rest of the index value. 这是因为它是索引的开始(这很快),db可以忽略索引值的其余部分。

An index on a,b is inefficient when searching by b alone simply because it doesn't give the possibility to use the index searching with "starts with thisfixedstring". 单独使用b进行搜索时,a,b上的索引效率很低 ,因为它无法使用“以thisfixedstring开头”来使用索引搜索。

So, either: 那么,要么:

  • Include _reference_1_id in the query (probably irrelevant) 在查询中包含_reference_1_id(可能不相关)
  • OR add an index on _reference_2_id (if you query by the field often) 或者在_reference_2_id上添加索引(如果您经常按字段查询)
  • OR use a hint 或使用提示

Hint 暗示

Probably your lowest-cost option right now. 可能是您现在最低成本的选择。

Add a query hint to force using your _reference_1_id_1__reference_2_id_1_id_1 index. 添加查询提示以强制使用_reference_1_id_1__reference_2_id_1_id_1索引。 Which is likely to be a lot faster than a full table scan, but still a lot slower than an index which starts with the field you are using in the query. 这可能比全表扫描快得多,但仍然比从您在查询中使用的字段开始的索引慢很多。

ie

db.mycoll
    .find({"_reference_2_id" : ObjectId("jkl7890123456")})
    .hint("_reference_1_id_1__reference_2_id_1_id_1");

Hye, I've quiet the same problem on an equivalent amount of datas. 是的,我在相同数量的数据上安静了同样的问题。 In the documentation, it's written that queries with index must fit in ram. 在文档中,编写了带索引的查询必须符合ram。 I think this is not the case, the query must be doing a lot of disk access to first retrieve the index and then get the value. 我认为情况并非如此,查询必须先做很多磁盘访问才能先检索索引然后获取值。 In your case, a direct collection read will be faster. 在您的情况下,直接收集读取将更快。

EV. EV。

我会尝试在_reference_2_id上设置一个非唯一索引,因为目前我怀疑你将完全相当于全表扫描,即使索引包含_reference_2_id ,它们也不会被使用(见这里 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM