简体   繁体   English

C#字典快速搜索最接近键的值

[英]C# Dictionary fast search of closest value to key

I have a really long string (thousands of lines). 我有一长串(数千行)。 I'm running RegEx expressions against the string and trying to identify the line numbers of matches. 我针对该字符串运行RegEx表达式,并尝试识别匹配项的行号。 However if I have a high match count (say, 10,000), to find the line numbers every time involves searching the html string again, which gets expensive. 但是,如果我的匹配计数很高(例如10,000),则每次查找行号都需要再次搜索html字符串,这会变得很昂贵。

What I want to do is search the string beforehand and build a hashtable of character positions of the line numbers. 我想做的是事先搜索字符串并建立行号字符位置的哈希表。 so I could use Dictionary and use the following code to find my line numbers. 因此我可以使用Dictionary并使用以下代码查找我的行号。

//find line endings
int lineCount = 0;
for (int charCount = 0; charCount <= html.Length; charCount++)
{
     if (html[charCount] == '\n')
     {
         lineCount++;
         lineEndings.Add(charCount, lineCount);
     }
}

However, when I run my RegExes, how do I search this dictionary? 但是,当我运行RegExes时,如何搜索该词典? the regex expression character position will need to be between two values in the lineEndings dictionary. regex表达式字符位置将需要在lineEndings词典中的两个值之间 What's the best / most efficient way to; 最好/最有效的方法是什么? given a Dictionary with a set of gapped keys, given a value that's not in the key list, to find the next closest key? 给一个带有一组空白键的字典,给一个不在键列表中的值,以查找下一个最接近的键?

One thing I've tried, and I'm not sure how it would perform, is 我尝试过的一件事是(我不确定它的效果如何)是

lineEndings.First(n => n.Key >= match.Index).Value

Dictionaries don't work when your definition of "equal" is just "close". 当您对“等于”的定义只是“接近”时,字典不起作用。

It's important that items in a dictionary be transative. 词典中的项目必须是可翻译的,这一点很重要。 If A = B and B = C then A should equal C. If that's not the case (which it isn't, if equality is defined as just "close", things start breaking down. 如果A = B并且B = C,那么A应该等于C。如果不是这种情况(不是这样,如果将相等定义为“ close”,则事情开始崩溃。)

To start with, there's no way that you can write an effective GetHashCode implementation here. 首先,您无法在此处编写有效的GetHashCode实现。 The only way for it to ever be valid is for everything to just return the same value, which means you've just degraded your performance to a linear search anyway. 使其有效的唯一方法是使所有内容都返回相同的值,这意味着您无论如何都只是将性能降低为线性搜索。

What you can do, given that you have a static set of strings, is put them all in a List or array, sort them, and then use a BinarySearch . 假设您有一组静态字符串,您可以做的是将它们全部放入一个List或数组中,对其进行排序,然后使用BinarySearch Since the data appears to be static, the fact that adding items to the lookup table is expensive shouldn't be a problem. 由于数据似乎是静态的,因此将项目添加到查找表很昂贵这一事实不应该成为问题。 A binary search also is capable of telling you where the item you are searching for would belong if it should be added, this means you can go to the index at that position to find the "next" item, and subtract one to find the "previous" item. 二进制搜索还可以告诉您要添加的项应属于哪个位置,这意味着您可以转到该位置的索引以找到“下一项”,然后减去以找到“上一个”项目。

You could use LINQ with your dictionary if you know what range you want the keys to be in. Something like this: 如果您知道希望键位于什么范围内,则可以在字典中使用LINQ。类似这样的内容:

    Dictionary<int, string> Test1 = new Dictionary<int, string>();
    public Form1()
    {
        InitializeComponent();
        Test1.Add(1, "asdf");
        Test1.Add(2, "ghjh");
        Test1.Add(3, "jkl;");
        Test1.Add(4, "qwer");

        int max = 4;
        int min = 1;
        listBox1.DataSource = (from kvp in Test1
                               where (kvp.Key > min && kvp.Key < max)
                               select (kvp.Value)).ToList();

    }

This creates a collection of values from the dictionary where the keys are in a certain range. 这会从字典中创建一个值集合,其中键在一定范围内。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM