简体   繁体   中英

Faster way to find first occurence of String in list

I have a method, that finds first occurrences in list of words. wordSet - set of words, that i need to check That list is representation of text, so words located in order, that text has. so if pwWords has suck elements {This,is,good,boy,and,this,girl,is,bad} and wordSet has {this,is} method should add true only for first two elements. My question is: is there any faster way to do this? Because if pwWords has like over million elements, and wordSet over 10 000 it works pretty slow.

public List<bool> getFirstOccurances(List<string> pwWords)
    {
        var firstOccurance = new List<bool>();
        var wordSet = new List<String>(WordsWithFDictionary.Keys);
        foreach (var pwWord in pwWords)
        {
            if (wordSet.Contains(pwWord))
            {
                firstOccurance.Add(true);
                wordSet.Remove(pwWord);
            }
            else
            {
                firstOccurance.Add(false);
            }
        }
        return firstOccurance;
    }

Another approach is using HashSet for wordSet

public List<bool> getFirstOccurances(List<string> pwWords)
{
    var wordSet = new HashSet<string>(WordsWithFDictionary.Keys);
    return pwWords.Select(word => wordSet.Contains(word)).ToList();
}

HashSet.Contains algorithm is O(1), where List.Contains will loop all items until item is found.

For better performance you can create wordSet only once if this is possible.

public class FirstOccurances
{
    private HashSet<string> _wordSet;

    public FirstOccurances(IEnumerable<string> wordKeys)
    {
        _wordSet = new HashSet<string>(wordKeys);
    }

    public List<bool> GetFor(List<string> words)
    {
        return words.Select(word => _wordSet.Contains(word)).ToList();
    }
}

Then use it

var occurrences = new FirstOccurances(WordsWithFDictionary.Keys);

// Now you can effectively search for occurrences multiple times
var result = occurrences.GetFor(pwWords);
var anotherResult = occurrences.GetFor(anotherPwWords);

Because item of pwWords can be checked for occurrences independently and if order of items not imported you can try to use Parallel LINQ

public List<bool> GetFor(List<string> words)
{
    return words.AsParallel().Select(word => _wordSet.Contains(word)).ToList();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM