简体   繁体   English

mongodb $all 和 $in 即使在索引字段上也很慢

[英]mongodb $all and $in very slow even on indexed fields

I have a collection of about 80 million documents, each of them storing an array of tags in the tags field, eg:我有大约 8000 万个文档的集合,每个文档都在tags字段中存储了一组tags ,例如:

{text: "blah blah blah...", tags: ["car", "auto", "automobile"]}

The field tags is indexed, so naturally the queries like this are almost instant:字段tags被编入索引,所以像这样的查询自然几乎是即时的:

 db.documents.find({tags:"car"})

However the following queries are all very slow, taking several minutes to complete:但是以下查询都很慢,需要几分钟才能完成:

 db.documents.find({tags:{$all:["car","phone"]}})
 db.documents.find({tags:{$in:["car","auto"]}})

The problem persists even if the array only has a single item:即使数组只有一个项目,问题仍然存在:

 db.documents.find({tags:{$all:["car"]}})  //very slow too

I thought $all and $in should be able to work very fast because tags is indexed but apparently it is not the case.我认为 $all 和 $in 应该能够非常快地工作,因为tags已编入索引,但显然情况并非如此。 Why?为什么?

It turns out this is a known bug in MongoDB which hasn't yet been fixed as of 2.2事实证明这是 MongoDB 中的一个已知错误,截至 2.2 尚未修复

MongoDB does not perform index intersection when searching for multiple entries using $all .使用多个条目搜索时的MongoDB执行索引交集$all Only the first item in the array is looked up using indexes, and a scan of all matched documents is performed to filter the results.使用索引仅查找数组中的第一项,并执行对所有匹配文档的扫描以过滤结果。

For example, in the query db.documents.find({tags:{$all:["car","phone"]}}) all documents containing the tag "car" need to be retrieved and scanned.例如,在查询db.documents.find({tags:{$all:["car","phone"]}})中,需要检索和扫描包含标签“car”的所有文档。 Since the collection in question contains over a hundred thousand documents tagged with "car", the slowdown is not surprising.由于有问题的集合包含超过 10 万个标记为“汽车”的文档,因此放缓并不奇怪。

Worse, MongoDB doesn't even perform the simple optimization of selecting the least represented item in the $all array for the index lookup.更糟糕的是,MongoDB 甚至没有执行选择 $all 数组中最少代表项以进行索引查找的简单优化。 If there are 100000 documents tagged "car" and 10 documents tagged "phone", MongoDB will still need to scan 100000 documents to return results for {$all:["car", "phone"]}如果有 100000 个标记为“car”的文档和 10 个标记为“phone”的文档,MongoDB 仍然需要扫描 100000 个文档以返回{$all:["car", "phone"]}

See also: https://jira.mongodb.org/browse/SERVER-1000另见: https : //jira.mongodb.org/browse/SERVER-1000

I just want to add, $in is fast.我只想补充一点,$in 很快。 In fact, for just 1 criteria or keyword, $in is equivalent with $all, yet $in is fast, and $all is slow.事实上,对于 1 个条件或关键字,$in 与 $all 等效,但 $in 很快,而 $all 很慢。

So use $in.所以使用 $in。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM