mongodb $all 和 $in 即使在索引字段上也很慢

Question

I have a collection of about 80 million documents, each of them storing an array of tags in the tags field, eg:我有大约 8000 万个文档的集合，每个文档都在tags字段中存储了一组tags ，例如：

{text: "blah blah blah...", tags: ["car", "auto", "automobile"]}

The field tags is indexed, so naturally the queries like this are almost instant:字段tags被编入索引，所以像这样的查询自然几乎是即时的：

 db.documents.find({tags:"car"})

However the following queries are all very slow, taking several minutes to complete:但是以下查询都很慢，需要几分钟才能完成：

 db.documents.find({tags:{$all:["car","phone"]}})
 db.documents.find({tags:{$in:["car","auto"]}})

The problem persists even if the array only has a single item:即使数组只有一个项目，问题仍然存在：

 db.documents.find({tags:{$all:["car"]}})  //very slow too

I thought $all and $in should be able to work very fast because tags is indexed but apparently it is not the case.我认为 $all 和 $in 应该能够非常快地工作，因为tags已编入索引，但显然情况并非如此。 Why?为什么？

Answer 1

It turns out this is a known bug in MongoDB which hasn't yet been fixed as of 2.2事实证明这是 MongoDB 中的一个已知错误，截至 2.2 尚未修复

MongoDB does not perform index intersection when searching for multiple entries using $all .使用多个条目搜索时的MongoDB不执行索引交集$all 。 Only the first item in the array is looked up using indexes, and a scan of all matched documents is performed to filter the results.使用索引仅查找数组中的第一项，并执行对所有匹配文档的扫描以过滤结果。

For example, in the query db.documents.find({tags:{$all:["car","phone"]}}) all documents containing the tag "car" need to be retrieved and scanned.例如，在查询db.documents.find({tags:{$all:["car","phone"]}})中，需要检索和扫描包含标签“car”的所有文档。 Since the collection in question contains over a hundred thousand documents tagged with "car", the slowdown is not surprising.由于有问题的集合包含超过 10 万个标记为“汽车”的文档，因此放缓并不奇怪。

Worse, MongoDB doesn't even perform the simple optimization of selecting the least represented item in the $all array for the index lookup.更糟糕的是，MongoDB 甚至没有执行选择 $all 数组中最少代表项以进行索引查找的简单优化。 If there are 100000 documents tagged "car" and 10 documents tagged "phone", MongoDB will still need to scan 100000 documents to return results for {$all:["car", "phone"]}如果有 100000 个标记为“car”的文档和 10 个标记为“phone”的文档，MongoDB 仍然需要扫描 100000 个文档以返回{$all:["car", "phone"]}

See also: https://jira.mongodb.org/browse/SERVER-1000另见： https : //jira.mongodb.org/browse/SERVER-1000

Answer 2

I just want to add, $in is fast.我只想补充一点，$in 很快。 In fact, for just 1 criteria or keyword, $in is equivalent with $all, yet $in is fast, and $all is slow.事实上，对于 1 个条件或关键字，$in 与 $all 等效，但 $in 很快，而 $all 很慢。

So use $in.所以使用 $in。

mongodb $all 和 $in 即使在索引字段上也很慢

问题描述

2 个解决方案

解决方案1
10 2012-10-06 16:49:01

解决方案2
-1 2012-11-14 10:34:41

mongodb $all 和 $in 即使在索引字段上也很慢

问题描述

2 个解决方案

解决方案1 10 2012-10-06 16:49:01

解决方案2 -1 2012-11-14 10:34:41

解决方案1
10 2012-10-06 16:49:01

解决方案2
-1 2012-11-14 10:34:41