iPad上的MonoTouch：如何使文本搜索更快？

Question

I need to do text search based on user input in a relative large list (about 37K lines with 50 to 100 chars each line). 我需要根据相对较大列表中的用户输入进行文本搜索（大约37,000行，每行50至100个字符）。 The search is done after entering each character and the result is shown in a UITableView . 输入每个字符后完成搜索，结果显示在UITableView 。 This is my current code: 这是我当前的代码：

if (input.Any(x => Char.IsUpper(x)))
    return _list.Where(x => x.Desc.Contains(input));
else
    return _list.Where(x => x.Desc.ToLower().Contains(input));

It performs okay on a MacBook running simulator, but too slow on iPad. 它可以在运行MacBook的模拟器上正常运行，但在iPad上运行太慢。

On interesting thing I observed is that it takes longer and longer as input grows. 我观察到的一件有趣的事情是，随着输入的增加，它花费的时间越来越长。 For example, say "examin" as input. 例如，说“ examin”作为输入。 It takes about 1 second after entering e, 2 seconds after x, 5 seconds after a, but 28 seconds after m and so on. 输入e之后大约需要1秒，x之后需要2秒，a之后需要5秒，而m之后需要28秒，依此类推。 Why that? 为什么？

I hope there is a simple way to improve it. 我希望有一种简单的方法可以改善它。

Answer 1

Always take care to avoid memory allocations in time sensitive code. 始终要注意避免在对时间敏感的代码中分配内存。

For example we often produce code often allocates string without realizing it, eg 例如，我们经常产生的代码经常分配string而没有意识到它，例如

x => x.Desc.ToLower().Contains(input)

That will allocate a string to return from ToLower . 这将分配一个字符串以从ToLower返回。 From your description this will occurs many time. 根据您的描述，这将发生很多次。 You can easily avoid this by using: 您可以使用以下方法轻松避免这种情况：

x = x.Desc.IndexOf ("s", StringComparison.OrdinalIgnoreCase) != -1

note: just select the StringComparison.*IgnoreCase that match your need. 注意：只需选择符合您需求的StringComparison.*IgnoreCase 。

Also LINQ is nice but it hides allocations in many cases - maybe not in your case but measuring is key to get things faster. LINQ也很不错，但是在很多情况下它都隐藏了分配-也许在您的情况下不行，但是测量是使事情变得更快的关键。 In that case using another algorithm (like suggested in another answer) could give you much better results (but keep in mind the allocations ;-) 在那种情况下，使用其他算法（如另一个答案中所建议的）可以为您提供更好的结果（但请记住分配；-)

UPDATE: 更新：

Mono's Contains(string) will call, after a few checks, the following: 经过几次检查，Mono的Contains(string)将调用以下内容：

CultureInfo.CurrentCulture.CompareInfo.IndexOf (this, value, 0, length, CompareOptions.Ordinal);

which, with your ToLower requirement that using StringComparison.OrdinalIgnoreCase is the perfect (ie identical) match for your existing code (it did not do any culture specific comparison). 根据您对ToLower要求，使用StringComparison.OrdinalIgnoreCase是您现有代码的完美（即相同）匹配项（它没有进行任何区域性特定的比较）。

Answer 2

Generally I've found that contains operations are not preferable for search, so I'd recommend you take a look at the Mastering Core Data Session (login required ) video on the WWDC 2010 page (around the 10 min mark). 通常，我发现包含操作不适合进行搜索，因此建议您看一下WWDC 2010页面上的Mastering Core Data Session （需要登录）视频（大约10分钟）。 Apple knows that 'contains' is terrible w/ SQLite on mobile devices, you can essentially do what Apple does to sort of "hack" FTS on the version of SQLite they ship. Apple知道在移动设备上使用SQLite时“包含”是很糟糕的，您基本上可以做Apple采取的措施来在他们发布的SQLite版本上“破解” FTS。

Essentially they do prefix matching by creating a table like: 本质上，它们通过创建如下表来进行前缀匹配：

[[ pk_id || input || normalized_input ]]

Where input and normalized_input are both indexed explicitly. 其中input和normalized_input 都被显式索引。 Then they prefix match against the normalized value. 然后，它们对归一化值进行前缀匹配。 So for instance if a user is searching for 'snuggles' and so far they've typed in 'snu' the prefix matching query would look like: 因此，例如，如果用户正在搜索“ snuggles”，并且到目前为止，他们已经输入“ snu”，则前缀匹配查询将类似于：

normalized_input >= 'snu' and normalized_input < 'snt'

Not sure if this translates given your use case, but I thought it was worth mentioning. 不知道给定您的用例是否可以翻译，但我认为值得一提。 Hope it's helpful! 希望对您有所帮助！

Answer 3

You need to use a trie. 您需要使用特里。 See http://en.wikipedia.org/wiki/Trie 参见http://en.wikipedia.org/wiki/Trie

iPad上的MonoTouch：如何使文本搜索更快？

问题描述

3 个解决方案

解决方案1
5 2011-12-09 13:32:30

解决方案2
1 2011-12-09 17:55:45

解决方案3
0 2011-12-09 06:02:26

iPad上的MonoTouch：如何使文本搜索更快？

问题描述

3 个解决方案

解决方案1 5 2011-12-09 13:32:30

解决方案2 1 2011-12-09 17:55:45

解决方案3 0 2011-12-09 06:02:26

解决方案1
5 2011-12-09 13:32:30

解决方案2
1 2011-12-09 17:55:45

解决方案3
0 2011-12-09 06:02:26