简体   繁体   English

如果字符串存在于大型对象列表中,那么比较最快(性能)的方法是什么?

[英]What is the fastest (performance) way to compare if a string is present in a large list of objects?

Currently I have object which contains two strings: 目前我有包含两个字符串的对象:

class myClass
{
    public string string1 { get; set; }
    public string string2 { get; set; }

    public bool MatcheString1(string newString)
    {
        if (this.string1 == newString)
        {
            return true;
        }
        return false;
    }
}

I then have a second class that makes a list of the aforementioned object using List. 然后,我有一个第二个类,使用List列出上述对象。

class URLs : IEnumerator, IEnumerable
{
    private List<myClass> myCustomList;
    private int position = -1;

    //  Constructor
    public URLs()
    {
        myCustomList = new List<myClass>();
    }
}

In that class I'm using a method to check if a string is present in the list 在那个类中,我正在使用一种方法来检查列表中是否存在字符串

//  We can also check if the URL string is present in the collection
public bool ContainsString1(string newString)
{
    foreach (myClass entry in myCustomList)
    {
        if (entry. MatcheString1(newString))
        {
            return true;
        }
    }

    return false;
}

Essentially, as the list of objects grows to the 100,000 mark, this process becomes very slow. 基本上,随着对象列表增长到100,000标记,此过程变得非常缓慢。 What is fast way to checking if that string is present? 什么是检查该字符串是否存在的快速方法? I'm happy to create a List outside of the class to validation, but that seems hacky to me? 我很高兴在类之外创建一个List进行验证,但这对我来说似乎很烦人?

Once the list of items is stable, you can compute a hash-set of the matches, for example: 一旦项目列表稳定,您就可以计算匹配的哈希集,例如:

// up-front work
var knownStrings = new HashSet<string>();
foreach(var item in myCustomList) knownStrings.Add(item.string1);

(note that this is not free, and will need to be re-computed as the list changes); (请注意,这不是免费的,并且需要在列表更改时重新计算); then, later , you can just check: 那么, 以后 ,你可以检查:

return knownStrings.Contains(newString);

which is then very cheap (O(1) instead of O(N)). 然后非常便宜(O(1)而不是O(N))。

If you don't mind using a different data structure, instead of a list, you could a dictionary where your objects are indexed by their string1 property. 如果您不介意使用不同的数据结构而不是列表,则可以使用字典,其中您的对象由其string1属性编制索引。

public URLs()
{
    myDictionary = new Dictionary<string, myClass>();
}

Since Dictionary<TKey, TValue> can usually find elements in O(1) time , you can perform that check very fast. 由于Dictionary<TKey, TValue> 通常 可以在O(1)时间内找到元素 ,因此您可以非常快速地执行该检查。

if(myDictionary.ContainsKey(newString))
  //...

Search over sorted array(list) takes O(logN) 搜索排序数组(列表)需要O(logN)

        var sortedList = new SortedSet<string>();
        sortedList.Add("abc");
        // and so on
        sortedList.Contains("test");

Search over HashSet takes O(1), but I guess in case of 100k elements(Log(100000)=5), and almost no difference to the HashSet that takes more memory. 通过HashSet搜索需要O(1),但我猜想在100k元素(Log(100000)= 5)的情况下,并且几乎没有差异占用更多内存的HashSet。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM