简体   繁体   English

搜索字符串集合的最快方法

[英]Fastest way to search in a string collection

Problem: 问题:

I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection. 我有一个大约120,000个用户(字符串)的文本文件,我想将其存储在一个集合中,然后再对该集合执行搜索。

The search method will occur every time the user change the text of a TextBox and the result should be the strings that contain the text in TextBox . 每次用户更改TextBox的文本时都会发生搜索方法,结果应该是包含 TextBox本的字符串。

I don't have to change the list, just pull the results and put them in a ListBox . 我不必更改列表,只需拉出结果并将它们放在ListBox

What I've tried so far: 到目前为止我尝试过的:

I tried with two different collections/containers, which I'm dumping the string entries from an external text file (once, of course): 我尝试了两个不同的集合/容器,我正在从外部文本文件中转储字符串条目(当然是一次):

  1. List<string> allUsers;
  2. HashSet<string> allUsers;

With the following LINQ query: 使用以下LINQ查询:

allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();

My search event (fires when user change the search text): 我的搜索事件(用户更改搜索文本时触发):

private void textBox_search_TextChanged(object sender, EventArgs e)
{
    if (textBox_search.Text.Length > 2)
    {
        listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
    }
    else
    {
        listBox_choices.DataSource = null;
    }
}

Results: 结果:

Both gave me a poor response time (around 1-3 seconds between each key press). 两者都给了我一个很差的响应时间(每次按键之间大约1-3秒)。

Question: 题:

Where do you think my bottleneck is? 你认为我的瓶颈在哪里? The collection I've used? 我用过的系列? The search method? 搜索方法? Both? 都?

How can I get better performance and more fluent functionality? 如何获得更好的性能和更流畅的功能?

You could consider doing the filtering task on a background thread which would invoke a callback method when it's done, or simply restart filtering if input is changed. 您可以考虑在后台线程上执行过滤任务,该线程将在完成时调用回调方法,或者只是在输入更改时重新启动过滤。

The general idea is to be able to use it like this: 一般的想法是能够像这样使用它:

public partial class YourForm : Form
{
    private readonly BackgroundWordFilter _filter;

    public YourForm()
    {
        InitializeComponent();

        // setup the background worker to return no more than 10 items,
        // and to set ListBox.DataSource when results are ready

        _filter = new BackgroundWordFilter
        (
            items: GetDictionaryItems(),
            maxItemsToMatch: 10,
            callback: results => 
              this.Invoke(new Action(() => listBox_choices.DataSource = results))
        );
    }

    private void textBox_search_TextChanged(object sender, EventArgs e)
    {
        // this will update the background worker's "current entry"
        _filter.SetCurrentEntry(textBox_search.Text);
    }
}

A rough sketch would be something like: 粗略的草图将是这样的:

public class BackgroundWordFilter : IDisposable
{
    private readonly List<string> _items;
    private readonly AutoResetEvent _signal = new AutoResetEvent(false);
    private readonly Thread _workerThread;
    private readonly int _maxItemsToMatch;
    private readonly Action<List<string>> _callback;

    private volatile bool _shouldRun = true;
    private volatile string _currentEntry = null;

    public BackgroundWordFilter(
        List<string> items,
        int maxItemsToMatch,
        Action<List<string>> callback)
    {
        _items = items;
        _callback = callback;
        _maxItemsToMatch = maxItemsToMatch;

        // start the long-lived backgroud thread
        _workerThread = new Thread(WorkerLoop)
        {
            IsBackground = true,
            Priority = ThreadPriority.BelowNormal
        };

        _workerThread.Start();
    }

    public void SetCurrentEntry(string currentEntry)
    {
        // set the current entry and signal the worker thread
        _currentEntry = currentEntry;
        _signal.Set();
    }

    void WorkerLoop()
    {
        while (_shouldRun)
        {
            // wait here until there is a new entry
            _signal.WaitOne();
            if (!_shouldRun)
                return;

            var entry = _currentEntry;
            var results = new List<string>();

            // if there is nothing to process,
            // return an empty list
            if (string.IsNullOrEmpty(entry))
            {
                _callback(results);
                continue;
            }

            // do the search in a for-loop to 
            // allow early termination when current entry
            // is changed on a different thread
            foreach (var i in _items)
            {
                // if matched, add to the list of results
                if (i.Contains(entry))
                    results.Add(i);

                // check if the current entry was updated in the meantime,
                // or we found enough items
                if (entry != _currentEntry || results.Count >= _maxItemsToMatch)
                    break;
            }

            if (entry == _currentEntry)
                _callback(results);
        }
    }

    public void Dispose()
    {
        // we are using AutoResetEvent and a background thread
        // and therefore must dispose it explicitly
        Dispose(true);
    }

    private void Dispose(bool disposing)
    {
        if (!disposing)
            return;

        // shutdown the thread
        if (_workerThread.IsAlive)
        {
            _shouldRun = false;
            _currentEntry = null;
            _signal.Set();
            _workerThread.Join();
        }

        // if targetting .NET 3.5 or older, we have to
        // use the explicit IDisposable implementation
        (_signal as IDisposable).Dispose();
    }
}

Also, you should actually dispose the _filter instance when the parent Form is disposed. 此外,您应该在处置父Form时实际处置_filter实例。 This means you should open and edit your Form 's Dispose method (inside the YourForm.Designer.cs file) to look something like: 这意味着您应该打开并编辑FormDispose方法(在YourForm.Designer.cs文件中),如下所示:

// inside "xxxxxx.Designer.cs"
protected override void Dispose(bool disposing)
{
    if (disposing)
    {
        if (_filter != null)
            _filter.Dispose();

        // this part is added by Visual Studio designer
        if (components != null)
            components.Dispose();
    }

    base.Dispose(disposing);
}

On my machine, it works pretty quickly, so you should test and profile this before going for a more complex solution. 在我的机器上,它的工作速度非常快,因此在进行更复杂的解决方案之前,您应该对其进行测试和分析。

That being said, a "more complex solution" would possibly be to store the last couple of results in a dictionary, and then only filter them if it turns out that the new entry differs by only the first of last character. 话虽这么说,一个“更复杂的解决方案”可能是将最后几个结果存储在一个字典中,然后只有在新条目仅与最后一个字符的第一个不同时才过滤它们。

I've done some testing, and searching a list of 120,000 items and populating a new list with the entries takes a negligible amount of time (about a 1/50th of a second even if all strings are matched). 我已经完成了一些测试,并且搜索了120,000个项目的列表并使用条目填充新列表需要的时间可以忽略不计(即使所有字符串都匹配,也只需要1/50秒)。

The problem you're seeing must therefore be coming from the populating of the data source, here: 因此,您所看到的问题必须来自数据源的填充,此处:

listBox_choices.DataSource = ...

I suspect you are simply putting too many items into the listbox. 我怀疑你只是在列表框中放了太多项目。

Perhaps you should try limiting it to the first 20 entries, like so: 也许您应该尝试将其限制为前20个条目,如下所示:

listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text))
    .Take(20).ToList();

Also note (as others have pointed out) that you are accessing the TextBox.Text property for each item in allUsers . 另请注意(正如其他人指出的那样)您正在访问allUsers每个项目的TextBox.Text属性。 This can easily be fixed as follows: 这可以很容易地修复如下:

string target = textBox_search.Text;
listBox_choices.DataSource = allUsers.Where(item => item.Contains(target))
    .Take(20).ToList();

However, I timed how long it takes to access TextBox.Text 500,000 times and it only took 0.7 seconds, far less than the 1 - 3 seconds mentioned in the OP. 但是,我花了多长时间访问TextBox.Text 500,000次,它只花了0.7秒,远远低于OP中提到的1-3秒。 Still, this is a worthwhile optimisation. 不过,这是值得的优化。

Use Suffix tree as index. 使用后缀树作为索引。 Or rather just build a sorted dictionary that associates every suffix of every name with the list of corresponding names. 或者更确切地说,只需构建一个排序字典,将每个名称的每个后缀与相应名称列表相关联。

For input: 输入:

Abraham
Barbara
Abram

The structure would look like: 结构看起来像:

a -> Barbara
ab -> Abram
abraham -> Abraham
abram -> Abram
am -> Abraham, Abram
aham -> Abraham
ara -> Barbara
arbara -> Barbara
bara -> Barbara
barbara -> Barbara
bram -> Abram
braham -> Abraham
ham -> Abraham
m -> Abraham, Abram
raham -> Abraham
ram -> Abram
rbara -> Barbara

Search algorithm 搜索算法

Assume user input "bra". 假设用户输入“bra”。

  1. Bisect the dictionary on user input to find the user input or the position where it could go. 在用户输入上对字典进行对比以查找用户输入或其可以进入的位置。 This way we find "barbara" - last key lower than "bra". 这样我们发现“barbara” - 最后一个键低于“bra”。 It is called lower bound for "bra". 它被称为“胸罩”的下限。 Search will take logarithmic time. 搜索将采用对数时间。
  2. Iterate from the found key onwards until user input no longer matches. 从找到的键开始迭代,直到用户输入不再匹配。 This would give "bram" -> Abram and "braham" -> Abraham. 这会给“bram” - > Abram和“braham” - >亚伯拉罕。
  3. Concatenate iteration result (Abram, Abraham) and output it. 连接迭代结果(Abram,Abraham)并输出它。

Such trees are designed for quick search of substrings. 这些树设计用于快速搜索子串。 It performance is close to O(log n). 它的性能接近于O(log n)。 I believe this approach will work fast enough to be used by GUI thread directly. 我相信这种方法可以快速工作,直接由GUI线程使用。 Moreover it will work faster then threaded solution due to absence of synchronization overhead. 此外,由于没有同步开销,它将比线程解决方案更快地工作。

You need either a text search engine (like Lucene.Net ), or database (you may consider an embedded one like SQL CE , SQLite , etc.). 您需要文本搜索引擎(如Lucene.Net )或数据库(您可能需要考虑嵌入式搜索引擎,如SQL CESQLite等)。 In other words, you need an indexed search. 换句话说,您需要一个索引搜索。 Hash-based search isn't applicable here, because you searching for sub-string, while hash-based search is well for searching for exact value. 基于散列的搜索在这里不适用,因为您搜索子字符串,而基于散列的搜索很适合搜索精确值。

Otherwise it will be an iterative search with looping through the collection. 否则,它将是循环遍历集合的迭代搜索。

It might also be useful to have a "debounce" type of event. 拥有“去抖”类型的事件也可能有用。 This differs from throttling in that it waits a period of time (for example, 200 ms) for changes to finish before firing the event. 这与限制不同之处在于它在触发事件之前等待一段时间(例如,200 ms)以完成更改。

See Debounce and Throttle: a visual explanation for more information about debouncing. 请参阅去抖动和节流:有关去抖动的更多信息的可视化解释 I appreciate that this article is JavaScript focused, instead of C#, but the principle applies. 我感谢这篇文章是以JavaScript为重点,而不是C#,但原则适用。

The advantage of this is that it doesn't search when you're still entering your query. 这样做的好处是,当您仍在输入查询时,它不会搜索。 It should then stop trying to perform two searches at once. 然后它应该停止尝试一次执行两次搜索。

Run the search on another thread, and show some loading animation or a progress bar while that thread is running. 在另一个线程上运行搜索,并在该线程运行时显示一些加载动画或进度条。

You may also try to parallelize the LINQ query. 您也可以尝试并行化LINQ查询。

var queryResults = strings.AsParallel().Where(item => item.Contains("1")).ToList();

Here is a benchmark that demonstrates the performance advantages of AsParallel(): 这是一个演示AsParallel()的性能优势的基准:

{
    IEnumerable<string> queryResults;
    bool useParallel = true;

    var strings = new List<string>();

    for (int i = 0; i < 2500000; i++)
        strings.Add(i.ToString());

    var stp = new Stopwatch();

    stp.Start();

    if (useParallel)
        queryResults = strings.AsParallel().Where(item => item.Contains("1")).ToList();
    else
        queryResults = strings.Where(item => item.Contains("1")).ToList();

    stp.Stop();

    Console.WriteLine("useParallel: {0}\r\nTime Elapsed: {1}", useParallel, stp.ElapsedMilliseconds);
}

Update: 更新:

I did some profiling. 我做了一些分析。

(Update 3) (更新3)

  • List content: Numbers generated from 0 to 2.499.999 列表内容:从0到2.499.999生成的数字
  • Filter text: 123 (20.477 results) 筛选条文字:123(20.477结果)
  • Core i5-2500, Win7 64bit, 8GB RAM 酷睿i5-2500,Win7 64bit,8GB内存
  • VS2012 + JetBrains dotTrace VS2012 + JetBrains dotTrace

The initial test run for 2.500.000 records took me 20.000ms. 2.500.000记录的初始测试运行时间为20.000毫秒。

Number one culprit is the call to textBox_search.Text inside Contains . 第一名罪魁祸首是调用Contains textBox_search.Text This makes a call for each element to the expensive get_WindowText method of the textbox. 这使得每个元素都调用了文本框的昂贵的get_WindowText方法。 Simply changing the code to: 只需将代码更改为:

    var text = textBox_search.Text;
    listBox_choices.DataSource = allUsers.Where(item => item.Contains(text)).ToList();

reduced the execution time to 1.858ms . 将执行时间缩短为1.858ms

Update 2 : 更新2:

The other two significant bottle-necks are now the call to string.Contains (about 45% of the execution time) and the update of the listbox elements in set_Datasource (30%). 另外两个重要的瓶颈现在是对string.Contains (大约45%的执行时间)的调用以及set_Datasource (30%)中列表框元素的更新。

We could make a trade-off between speed and memory usage by creating a Suffix tree as Basilevs has suggested to reduce the number of necessary compares and push some processing time from the search after a key-press to the loading of the names from file which might be preferable for the user. 我们可以通过创建一个后缀树来在速度和内存使用之间进行权衡,因为Basilevs建议减少必要的比较数量,并在按键后从搜索推送一些处理时间到从文件中加载名称。可能对用户来说更可取。

To increase the performance of loading the elements into the listbox I would suggest to load only the first few elements and indicate to the user that there are further elements available. 为了提高将元素加载到列表框中的性能,我建议只加载前几个元素,并向用户表明还有其他元素可用。 This way you give a feedback to the user that there are results available so they can refine their search by entering more letters or load the complete list with a press of a button. 这样,您可以向用户提供有关结果的反馈,以便他们可以通过输入更多字母或按一下按钮加载完整列表来优化搜索。

Using BeginUpdate and EndUpdate made no change in the execution time of set_Datasource . 使用BeginUpdateEndUpdate不会改变set_Datasource的执行时间。

As others have noted here, the LINQ query itself runs quite fast. 正如其他人在这里指出的那样,LINQ查询本身运行得非常快。 I believe your bottle-neck is the updating of the listbox itself. 我相信你的瓶颈是列表框本身的更新。 You could try something like: 你可以尝试类似的东西:

 if (textBox_search.Text.Length > 2) { listBox_choices.BeginUpdate(); listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList(); listBox_choices.EndUpdate(); } 

I hope this helps. 我希望这有帮助。

Assuming you are only matching by prefixes, the data structure you are looking for is called a trie , also known as "prefix tree". 假设您只是通过前缀匹配,您要查找的数据结构称为trie ,也称为“前缀树”。 The IEnumerable.Where method that you're using now will have to iterate through all items in your dictionary on each access. 您现在使用的IEnumerable.Where方法必须在每次访问时遍历字典中的所有项目。

This thread shows how to create a trie in C#. 该主题展示了如何在C#中创建一个trie。

The WinForms ListBox control really is your enemy here. WinForms ListBox控件确实是你的敌人。 It will be slow to load the records and the ScrollBar will fight you to show all 120,000 records. 加载记录的速度很慢,ScrollBar会跟你一起显示所有120,000条记录。

Try using an old-fashioned DataGridView data-sourced to a DataTable with a single column [UserName] to hold your data: 尝试使用一个老式的DataGridView数据源,使用单个列[UserName]来保存数据:DataTable:

private DataTable dt;

public Form1() {
  InitializeComponent();

  dt = new DataTable();
  dt.Columns.Add("UserName");
  for (int i = 0; i < 120000; ++i){
    DataRow dr = dt.NewRow();
    dr[0] = "user" + i.ToString();
    dt.Rows.Add(dr);
  }
  dgv.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.Fill;
  dgv.AllowUserToAddRows = false;
  dgv.AllowUserToDeleteRows = false;
  dgv.RowHeadersVisible = false;
  dgv.DataSource = dt;
}

Then use a DataView in the TextChanged event of your TextBox to filter the data: 然后在TextBox的TextChanged事件中使用DataView来过滤数据:

private void textBox1_TextChanged(object sender, EventArgs e) {
  DataView dv = new DataView(dt);
  dv.RowFilter = string.Format("[UserName] LIKE '%{0}%'", textBox1.Text);
  dgv.DataSource = dv;
}

First I would change how ListControl sees your data source, you're converting result IEnumerable<string> to List<string> . 首先,我将更改ListControl查看数据源的方式,您将结果IEnumerable<string>转换为List<string> Especially when you just typed few characters this may be inefficient (and unneeded). 特别是当你输入几个字符时,这可能是低效的(并且不需要)。 Do not make expansive copies of your data . 不要制作数据的大量副本

  • I would wrap .Where() result to a collection that implements only what is required from IList (search). 我将.Where()结果包装到一个只实现IList (搜索)所需内容的集合中。 This will save you to create a new big list for each character is typed. 这将节省您为每个键入的字符创建一个新的大列表。
  • As alternative I would avoid LINQ and I'd write something more specific (and optimized). 作为替代方案,我会避免LINQ,我会写一些更具体(和优化)的东西。 Keep your list in memory and build an array of matched indices, reuse array so you do not have to reallocate it for each search. 将列表保留在内存中并构建匹配索引数组,重用数组,这样您就不必为每次搜索重新分配它。

Second step is to do not search in the big list when small one is enough. 第二步是当小的足够时不要在大列表中搜索。 When user started to type "ab" and he adds "c" then you do not need to research in the big list, search in the filtered list is enough (and faster). 当用户开始键入“ab”并添加“c”时,您不需要在大列表中进行研究,在过滤列表中搜索就足够了(并且更快)。 Refine search every time is possible, do not perform a full search each time. 每次都可以进行精确搜索,每次都不要进行全面搜索。

Third step may be harder: keep data organized to be quickly searched . 第三步可能更难: 保持数据组织快速搜索 Now you have to change the structure you use to store your data. 现在,您必须更改用于存储数据的结构。 imagine a tree like this: 想象一下这样的树:

A        B         C
 Add      Better    Ceil
 Above    Bone      Contour

This may simply be implemented with an array (if you're working with ANSI names otherwise a dictionary would be better). 这可以简单地用数组实现(如果你使用ANSI名称,否则字典会更好)。 Build the list like this (illustration purposes, it matches beginning of string): 像这样构建列表(插图目的,它匹配字符串的开头):

var dictionary = new Dictionary<char, List<string>>();
foreach (var user in users)
{
    char letter = user[0];
    if (dictionary.Contains(letter))
        dictionary[letter].Add(user);
    else
    {
        var newList = new List<string>();
        newList.Add(user);
        dictionary.Add(letter, newList);
    }
}

Search will be then done using first character: 然后使用第一个字符完成搜索:

char letter = textBox_search.Text[0];
if (dictionary.Contains(letter))
{
    listBox_choices.DataSource =
        new MyListWrapper(dictionary[letter].Where(x => x.Contains(textBox_search.Text)));
}

Please note I used MyListWrapper() as suggested in first step (but I omitted by 2nd suggestion for brevity, if you choose right size for dictionary key you may keep each list short and fast to - maybe - avoid anything else). 请注意我在第一步中建议使用MyListWrapper() (但为了简洁,我省略了第二个建议,如果你选择正确的字典大小,你可以保持每个列表的简短和快速 - 也许 - 避免其他任何事情)。 Moreover note that you may try to use first two characters for your dictionary (more lists and shorter). 此外请注意,您可以尝试使用前两个字符作为字典(更多列表和更短)。 If you extend this you'll have a tree (but I don't think you have such big number of items). 如果你扩展这个你将有一棵树(但我不认为你有这么大的项目)。

There are many different algorithms for string searching (with related data structures), just to mention few: 字符串搜索有许多不同的算法 (具有相关的数据结构),仅举几例:

  • Finite state automaton based search : in this approach, we avoid backtracking by constructing a deterministic finite automaton (DFA) that recognizes stored search string. 基于有限状态自动机的搜索 :在这种方法中,我们通过构造识别存储的搜索字符串的确定性有限自动机(DFA)来避免回溯。 These are expensive to construct—they are usually created using the powerset construction—but are very quick to use. 这些构造成本很高 - 它们通常使用powerset构造创建 - 但使用起来非常快。
  • Stubs : Knuth–Morris–Pratt computes a DFA that recognizes inputs with the string to search for as a suffix, Boyer–Moore starts searching from the end of the needle, so it can usually jump ahead a whole needle-length at each step. 存根 :Knuth-Morris-Pratt计算一个DFA,它识别带有字符串的输入作为后缀进行搜索,Boyer-Moore从针的末端开始搜索,因此它通常可以在每一步向前跳过整个针长。 Baeza–Yates keeps track of whether the previous j characters were a prefix of the search string, and is therefore adaptable to fuzzy string searching. Baeza-Yates跟踪前面的j个字符是否是搜索字符串的前缀,因此适用于模糊字符串搜索。 The bitap algorithm is an application of Baeza–Yates' approach. bitap算法是Baeza-Yates方法的应用。
  • Index methods : faster search algorithms are based on preprocessing of the text. 索引方法 :更快的搜索算法基于文本的预处理。 After building a substring index, for example a suffix tree or suffix array, the occurrences of a pattern can be found quickly. 在构建子串索引(例如后缀树或后缀数组)之后,可以快速找到模式的出现。
  • Other variants : some search methods, for instance trigram search, are intended to find a "closeness" score between the search string and the text rather than a "match/non-match". 其他变体 :一些搜索方法,例如trigram搜索,旨在找到搜索字符串和文本之间的“接近度”分数,而不是“匹配/不匹配”。 These are sometimes called "fuzzy" searches. 这些有时被称为“模糊”搜索。

Few words about parallel search. 关于并行搜索的几句话。 It's possible but it's seldom trivial because overhead to make it parallel can be easily much higher that search itself. 这是可能的,但它很少是微不足道的,因为使其并行的开销可以比搜索本身高得多。 I wouldn't perform search itself in parallel (partitioning and synchronization will become soon too expansive and maybe complex) but I would move search to a separate thread . 我不会并行执行搜索(分区和同步将很快变得过于庞大并且可能很复杂)但我会将搜索移动到单独的线程中 If main thread isn't busy your users won't feel any delay while they're typing (they won't note if list will appear after 200 ms but they'll feel uncomfortable if they have to wait 50 ms after they typed). 如果主线程不忙,用户在打字时不会感到任何延迟(他们不会注意列表是否会在200毫秒后出现但如果他们输入后需要等待50毫秒,他们会感到不舒服) 。 Of course search itself must be fast enough, in this case you don't use threads to speed up search but to keep your UI responsive . 当然搜索本身必须足够快,在这种情况下,你不使用线程来加速搜索,但保持你的UI响应 Please note that a separate thread will not make your query faster , it won't hang UI but if your query was slow it'll still be slow in a separate thread (moreover you have to handle multiple sequential requests too). 请注意, 单独的线程不会使您的查询更快 ,它不会挂起UI但如果您的查询很慢,它在单独的线程中仍然会很慢(此外,您还必须处理多个顺序请求)。

You could try using PLINQ (Parallel LINQ). 您可以尝试使用PLINQ (并行LINQ)。 Although this does not garantee a speed boost, this you need to find out by trial and error. 虽然这并不能保证提速,但你需要通过反复试验找出答案。

I doubt you'll be able to make it faster, but for sure you should: 我怀疑你能不能让它更快,但你肯定应该:

a) Use the AsParallel LINQ extension method a)使用AsParallel LINQ扩展方法

a) Use some kind of timer to delay filtering a)使用某种计时器来延迟过滤

b) Put a filtering method on another thread b)在另一个线程上放置一个过滤方法

Keep some kind of string previousTextBoxValue somewhere. 在某处保留某种string previousTextBoxValue Make a timer with a delay of 1000 ms, that fires searching on tick if previousTextBoxValue is same as your textbox.Text value. 制作一个延迟为1000毫秒的计时器,如果previousTextBoxValuetextbox.Text值相同,则会触发勾选搜索。 If not - reassign previousTextBoxValue to the current value and reset the timer. 如果不是 - 将previousTextBoxValue重新分配给当前值并重置计时器。 Set the timer start to the textbox changed event, and it'll make your application smoother. 将计时器开始设置为文本框已更改事件,它将使您的应用程序更流畅。 Filtering 120,000 records in 1-3 seconds is OK, but your UI must remain responsive. 在1-3秒内过滤120,000条记录是可以的,但您的UI必须保持响应。

You can also try using BindingSource.Filter function. 您也可以尝试使用BindingSource.Filter函数。 I have used it and it works like a charm to filter from bunch of records, every time update this property with the text being search. 我已经使用它,它就像一个魅力来过滤一堆记录,每次都用搜索文本更新这个属性。 Another option would be to use AutoCompleteSource for TextBox control. 另一种选择是使用AutoCompleteSource进行TextBox控件。

Hope it helps! 希望能帮助到你!

I would try to sort collection, search to match only start part and limit search by some number. 我会尝试对集合进行排序,搜索仅匹配起始部分并限制搜索某个数字。

so on ininialization ininialization

allUsers.Sort();

and search 和搜索

allUsers.Where(item => item.StartWith(textBox_search.Text))

Maybe you can add some cache. 也许你可以添加一些缓存。

Use Parallel LINQ . 使用并行LINQ PLINQ is a parallel implementation of LINQ to Objects. PLINQ是LINQ to Objects的并行实现。 PLINQ implements the full set of LINQ standard query operators as extension methods for the T:System.Linq namespace and has additional operators for parallel operations. PLINQ实现了一整套LINQ标准查询运算符作为T:System.Linq命名空间的扩展方法,并具有用于并行操作的附加运算符。 PLINQ combines the simplicity and readability of LINQ syntax with the power of parallel programming. PLINQ将LINQ语法的简单性和可读性与并行编程的强大功能相结合。 Just like code that targets the Task Parallel Library, PLINQ queries scale in the degree of concurrency based on the capabilities of the host computer. 就像以任务并行库为目标的代码一样,PLINQ查询根据主机的功能在并发度上进行扩展。

Introduction to PLINQ PLINQ简介

Understanding Speedup in PLINQ 了解PLINQ中的加速

Also you can use Lucene.Net 您也可以使用Lucene.Net

Lucene.Net is a port of the Lucene search engine library, written in C# and targeted at .NET runtime users. Lucene.Net是Lucene搜索引擎库的一个端口,用C#编写,面向.NET运行时用户。 The Lucene search library is based on an inverted index. Lucene搜索库基于倒排索引。 Lucene.Net has three primary goals: Lucene.Net有三个主要目标:

According to what I have seen I agree with the fact to sort the list. 根据我所看到的,我同意对列表进行排序的事实。

However to sort when the list is construct will be very slow, sort when building, you will have a better execution time. 但是,在构造列表时排序将非常慢,在构建时排序,您将有更好的执行时间。

Otherwise if you don't need to display the list or to keep the order, use a hashmap. 否则,如果您不需要显示列表或保留顺序,请使用hashmap。

The hashmap will hash your string and search at the exact offset. hashmap将对您的字符串进行哈希处理并搜索精确的偏移量。 It should be faster I think. 我想它应该更快。

Try use BinarySearch method it should work faster then Contains method. 尝试使用BinarySearch方法,它应该比Contains方法更快。

Contains will be an O(n) BinarySearch is an O(lg(n)) 包含将是O(n)BinarySearch是O(lg(n))

I think that sorted collection should work faster on search and slower on adding new elements, but as I understood you have only search perfomance problem. 我认为排序的集合应该在搜索上运行得更快,在添加新元素时会更慢,但据我所知,你只有搜索性能问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM