MongoDB的性能各不相同

Question

i have a mongo collection like this: 我有一个像这样的mongo集合：


{
"A2_AboutMe": "",
"A2_Attributes": "|av|nv|",
"A2_Birthday": "",
"A2_DateCreated": "2010-11-25 22: 59: 00",
"A2_DateLast": "2011-11-18 12: 09: 36",
"A2_FK_A1_IDPerson": "0",
"A2_Firstname": "José Luis",
"A2_FirstnameC": "Jose Luis",
"A2_Gender": "m",
"A2_IDProfile": "1",
"A2_Keywords": "...|..",
"A2_Lastname": "test - test",
"A2_LastnameC": "_test test",
"A2_Locale": "",
"A2_Middlename": "",
"A2_Name": "José Luis test",
"A2_NameC": "Jose Luis test",
...
}

with indexies on A2_LastnameC and A2_FirstnameC 3.000.000 docs in this collection, 8 GB data storage 在此集合中具有A2_LastnameC和A2_FirstnameC 3.000.000文档上的索引，8 GB数据存储

following query(PHP) in done in 3-4 sec 在3-4秒内完成以下查询（PHP）

$collection->find(array(«A2_FirstnameC» => new MongoRegex("/jose/i")))->sort(array(«A2_LastnameC» => -1))->limit(10)

but sometimes the similar queries are done in less than 100 msec. 但有时类似的查询会在不到100毫秒的时间内完成。

what can i do to get this performance each time? 每次我该怎么做才能获得这种表现？

test computer is i7, 8GB Ram(7 is used by mongo), Windows 7 测试计算机为i7、8GB Ram（mongo使用7），Windows 7

Answer 1

Indexes can't be used for case-insensitive regular expression queries, nor for non-rooted regular expressions (those not beginning with " ^ "). 索引不能用于不区分大小写的正则表达式查询，也不能用于无根的正则表达式（不是以“ ^ ”开头的正则表达式）。 Since you already have the A2_Firstname field denormalized as A2_FirstnameC , you could also store that field case-normalized (ie either all lower or all upper case), and avoid needing to use case insensitive regular expressions; 由于您已经将A2_Firstname字段反规范化为A2_FirstnameC ，因此您还可以将该字段进行大小写规范化（即全部小写或全部大写），并且避免使用不区分大小写的正则表达式； however even in this case, you will still be doing a full scan of the collection if you are not using aa rooted regular expression. 但是，即使在这种情况下，如果您不使用根植的正则表达式，仍将对集合进行完整扫描。 Whether or not you can afford to use one in this case depends on your exact use case. 在这种情况下，您是否可以负担得起使用费用取决于您的确切使用情况。

Answer 2

First of all index won't be used for non-prefix-like, case-insensitive regular expressions. 首先，索引不会用于非前缀，不区分大小写的正则表达式。 But in the query above index can be used for sorting by A2_LastnameC field so this is fast. 但是在上面的查询中，索引可以用于按A2_LastnameC字段进行排序，因此速度很快。 Now having the sorted data MongoDB will need to get A2_FirstnameC value and match it against the regexp stopping when there's 10 matches ready (it will be also relatively fast because it will use index to retrieve the data instead of reading whole documents from disk). 现在，拥有排序后的数据，MongoDB将需要获取A2_FirstnameC值并将其与正则表达式匹配（当准备好10个匹配项时停止）（这也相对较快，因为它将使用索引来检索数据，而不是从磁盘读取整个文档）。 Depending on data order it can happen to match the first 10 documents - this is the best case and it will be very fast, the worst case would be the matches to occur on the last 10 docs having to scan all the previous index entries. 根据数据顺序，它可能碰巧匹配前10个文档-这是最好的情况，而且速度非常快，最坏的情况是将在必须扫描所有先前索引条目的最后10个文档中发生匹配。

How to speed this up? 如何加快速度？ Either use query that can use index, like: «A2_FirstnameC» => new MongoRegex("/^jose/") . 可以使用可以使用索引的查询，例如： «A2_FirstnameC» => new MongoRegex("/^jose/") 。 Or you have to use some kind of full-text search. 或者，您必须使用某种全文本搜索。 A simple way would be to split the field ( A2_Firstname in your case) into words, normalize them (convert to lower case, replace accents) and store as an array. 一种简单的方法是将字段（在您的情况下为A2_Firstname ）拆分为单词，对其进行规范化（转换为小写，替换重音符号）并存储为数组。 Now an index for the array field will be used to do fast searches. 现在，将使用数组字段的索引进行快速搜索。

MongoDB的性能各不相同

问题描述

2 个解决方案

解决方案1
0 2011-11-30 12:56:24

解决方案2
0 已采纳 2011-11-30 12:57:13

MongoDB的性能各不相同

问题描述

2 个解决方案

解决方案1 0 2011-11-30 12:56:24

解决方案2 0 已采纳 2011-11-30 12:57:13

解决方案1
0 2011-11-30 12:56:24

解决方案2
0 已采纳 2011-11-30 12:57:13