简体   繁体   English

用于在字符串列表C#中查找字符串匹配的最佳比较算法

[英]Optimal Compare Algorithm for finding string matches in List of strings C#

Say I have a list of 100,000 words. 假设我有100,000个字的清单。 I want to find out if a given string matches any words in that list, and I want to do it in the fastest way possible. 我想找出给定的字符串是否匹配该列表中的任何单词,并且我想以最快的方式做到这一点。 Also I want to know if any other words, that are formed by starting with the first character, in that string appear in the list. 我也想知道在该字符串中是否出现以第一个字符开头的其他单词。

For example: 例如:

Say you have the string "icedtgg" 假设您有字符串“ icedtgg”

"i" "ic" "ice" "iced" "icedt" "icedtg" "icedtgg" “ i”“ ic”“ ice”“ iced”“ icedt”“ icedtg”“ icedtgg”

I am trying to come up with an optimal compare algorithm that tells me if the following words are in my list. 我正在尝试提出一种最佳比较算法,该算法可以告诉我以下列表中是否包含以下单词。

What I have so far is my list of 100,000 words are stored in a 到目前为止,我的100,000个单词列表存储在

Dicitonary<char, List<string>> WordList;

where char is the first character of the word, and the List<string> is all of the words that start with that character. 其中char是单词的第一个字符,而List<string>是所有以该字符开头的单词。

So WordList['a'] has a list of all words that start with 'a' ("ape", "apple", "art" etc.) and 'b' has a list of all words that start with b etc. 因此, WordList['a']包含以'a'开头的所有单词的列表(“ ape”,“ apple”,“ art”等),'b'包含以b等开头的所有单词的列表。

Since I know that all of my words start with 'i', I can first narrow my solution down from 100,000 words to just the words that start with 'i'. 因为我知道我所有的单词都以“ i”开头,所以我可以先将解决方案的范围从100,000个单词缩小到以“ i”开头的单词。

List<string> CurrentWordList = WordList['i'];

Now I check 现在我检查

if( CurrentWordList[0].Length == 1 )

Then I know my first string is a match "i" because "i" will be the first word im the list. 然后我知道我的第一个字符串是匹配项“ i”,因为“ i”将成为列表中的第一个单词。 These lists are sorted alphabetically beforehand, so as not to slow down the matching. 这些列表事先按字母顺序排序,以免减慢匹配速度。

Any ideas? 有任何想法吗?

*No this is not a HW assigment, I am a profesionall Software Architect trying to find an optimal match algorithm for fun/hobby/game development. *不,这不是硬件任务,我是一位专业的软件架构师,试图为娱乐/爱好/游戏开发找到最佳匹配算法。

I decided to add this answer not because it is the optimal solution to your problem, but to illustrate two possible solutions that are relatively simple and that are somewhat in line with the approach you seem to be following yourself. 我决定添加此答案,不是因为它是解决问题的最佳方法,而是为了说明两种可能的解决方案,这些解决方案相对简单,并且与您似乎在遵循自己的方法相符。

The (non-optimized) sample below provides an extremely simple prefix trie implementation, that uses a node per consumed character. 下面的(未优化)示例提供了一个非常简单的前缀Trie实现,该实现使用每个消耗的字符一个节点。

public class SimplePrefixTrie
{
    private readonly Node _root = new Node(); // root represents empty string.

    private class Node
    {
        public Dictionary<char, Node> Children;
        public bool IsTerminal; // whether a full word ends here.

        public Node Find(string word, int index)
        {
            var child = default(Node);
            if (index < word.Length && Children != null)
                Children.TryGetValue(word[index], out child);
            return child;
        }

        public Node Add(string word, int toConsume)
        {
            var child = default(Node);
            if (toConsume == word.Length)
                this.IsTerminal = true;
            else if (Children == null || !Children.TryGetValue(word[toConsume], out child))
            {
                if (Children == null)
                    Children = new Dictionary<char, Node>();
                Children[word[toConsume]] = child = new Node();
            }
            return child;
        }
    }

    public void AddWord(string word)
    {
        var ndx = 0;
        var cur = _root;
        while (cur != null)
            cur = cur.Add(word, ndx++);
    }

    public IEnumerable<string> FindWordsMatchingPrefixesOf(string searchWord)
    {
        var ndx = 0;
        var cur = _root;
        while (cur != null)
        {
            if (cur.IsTerminal)
                yield return searchWord.Substring(0, ndx);
            cur = cur.Find(searchWord, ndx++);
        }
    }
}

A simple implementation of a compressed prefix trie is also added below. 下面还添加了压缩前缀trie的简单实现。 It follows an almost identical approach to the sample above, but stores shared prefix parts, instead of single characters. 它采用与上面的示例几乎相同的方法,但是存储共享的前缀部分,而不是单个字符。 It splits nodes when an existing stored prefix becomes shared and needs to be split into two parts. 当现有存储的前缀变为共享并且需要分为两部分时,它将拆分节点。

public class SimpleCompressedPrefixTrie
{
    private readonly Node _root = new Node();

    private class Node
    {
        private Dictionary<char, Node> _children;
        public string PrefixValue = string.Empty;
        public bool IsTerminal;

        public Node Add(string word, ref int startIndex)
        {
            var n = FindSharedPrefix(word, startIndex);
            startIndex += n;
            if (n == PrefixValue.Length) // full prefix match
            {
                if (startIndex == word.Length) // full match
                    IsTerminal = true;
                else
                    return AddToChild(word, ref startIndex);
            }
            else // partial match, need to split this node's prefix.
                SplittingAdd(word, n, ref startIndex);
            return null;
        }

        public Node Find(string word, ref int startIndex, out int matchLen)
        {
            var n = FindSharedPrefix(word, startIndex);
            startIndex += n;
            matchLen = -1;
            if (n == PrefixValue.Length)
            {
                if (IsTerminal)
                    matchLen = startIndex;
                var child = default(Node);
                if (_children != null && startIndex < word.Length && _children.TryGetValue(word[startIndex], out child))
                {
                    startIndex++; // consumed map key character.
                    return child;
                }
            }
            return null;
        }

        private Node AddToChild(string word, ref int startIndex)
        {
            var key = word[startIndex++]; // consume the mapping character
            var nextNode = default(Node);
            if (_children == null)
                _children = new Dictionary<char, Node>();
            else if (_children.TryGetValue(key, out nextNode))
                return nextNode;
            var remainder = word.Substring(startIndex);
            _children[key] = new Node() { PrefixValue = remainder, IsTerminal = true };
            return null; // consumed.
        }

        private void SplittingAdd(string word, int n, ref int startIndex)
        {
            var curChildren = _children;
            _children = new Dictionary<char, Node>();
            _children[PrefixValue[n]] = new Node()
            {
                PrefixValue = this.PrefixValue.Substring(n + 1),
                IsTerminal = this.IsTerminal,
                _children = curChildren
            };
            PrefixValue = PrefixValue.Substring(0, n);
            IsTerminal = startIndex == word.Length;
            if (!IsTerminal)
            {
                var prefix = word.Length > startIndex + 1 ? word.Substring(startIndex + 1) : string.Empty;
                _children[word[startIndex]] = new Node() { PrefixValue = prefix, IsTerminal = true };
                startIndex++;
            }
        }

        private int FindSharedPrefix(string word, int startIndex)
        {
            var n = Math.Min(PrefixValue.Length, word.Length - startIndex);
            var len = 0;
            while (len < n && PrefixValue[len] == word[len + startIndex])
                len++;
            return len;
        }
    }

    public void AddWord(string word)
    {
        var ndx = 0;
        var cur = _root;
        while (cur != null)
            cur = cur.Add(word, ref ndx);
    }

    public IEnumerable<string> FindWordsMatchingPrefixesOf(string searchWord)
    {
        var startNdx = 0;
        var cur = _root;
        while (cur != null)
        {
            var matchLen = 0;
            cur = cur.Find(searchWord, ref startNdx, out matchLen);
            if (matchLen > 0)
                yield return searchWord.Substring(0, matchLen);
        };
    }
}

Usage examples: 用法示例:

var trie = new SimplePrefixTrie(); // or new SimpleCompressedPrefixTrie();
trie.AddWord("hello");
trie.AddWord("iced");
trie.AddWord("i");
trie.AddWord("ice");
trie.AddWord("icecone");
trie.AddWord("dtgg");
trie.AddWord("hicet");
foreach (var w in trie.FindWordsMatchingPrefixesOf("icedtgg"))
    Console.WriteLine(w);

With output: 输出:

i
ice
iced

UPDATE: Selecting the right data structure matters 更新:选择正确的数据结构很重要

I think an update could provide some value to illustrate how selecting a data structure that fits the problem well is important and what kinds of trade-offs are involved. 我认为更新可以提供一些价值,以说明选择适合问题的数据结构如何重要以及涉及哪些折衷方案。 Therefore I created a small benchmark application that tests the strategies in the answers provided to this question so far, versus a baseline reference implementation. 因此,我创建了一个小型基准应用程序,该应用程序测试了迄今为止提供给该问题的答案中的策略以及基准参考实现。

  • Naive: Is the simplest possible naive solution. 天真:是最简单的天真解决方案。
  • JimMischel: Is based on the approach from this answer . JimMischel:基于此答案的方法。
  • MattyMerrix: Is based on your own answer here . MattyMerrix:是根据你自己的答案在这里
  • JimMattyDSL: Combines the 'JimMischel' and 'MattyMerrix' approaches and uses a more optimal binary string search in the sorted list. JimMattyDSL:结合了“ JimMischel”和“ MattyMerrix”方法,并在排序列表中使用了更优化的二进制字符串搜索。
  • SimpleTrie and CompessedTrie are based on the two implementations described in this answer. SimpleTrieCompessedTrie基于此答案中描述的两种实现。

The full benchmark code can be found in this gist . 完整的基准代码可在本要点中找到。 The results of running it with dictionaries of 10,000, 100,000, and 1,000,000 (randomly generated character sequence) words and searching for all prefix matches of 5,000 terms are: 使用10,000、100,000和1,000,000(随机生成的字符序列)单词的字典运行它并搜索5,000个词的所有前缀匹配项的结果是:

Matching 5000 words to dictionary of 10000 terms of max length 25 将5000个单词与最大长度为10000的字典进行匹配25

       Method              Memory (MB)         Build Time (s)        Lookup Time (s)
        Naive          0.64-0.64, 0.64     0.001-0.002, 0.001     6.136-6.312, 6.210
   JimMischel          0.84-0.84, 0.84     0.013-0.018, 0.016     0.083-0.113, 0.102
  JimMattyDSL          0.80-0.81, 0.80     0.013-0.018, 0.016     0.008-0.011, 0.010
   SimpleTrie       24.55-24.56, 24.56     0.042-0.056, 0.051     0.002-0.002, 0.002
CompessedTrie          1.84-1.84, 1.84     0.003-0.003, 0.003     0.002-0.002, 0.002
  MattyMerrix          0.83-0.83, 0.83     0.017-0.017, 0.017     0.034-0.034, 0.034

Matching 5000 words to dictionary of 100000 terms of max length 25 将5000个单词与100000个最大长度的词的词典匹配25

       Method              Memory (MB)         Build Time (s)        Lookup Time (s)
        Naive          6.01-6.01, 6.01     0.024-0.026, 0.025  65.651-65.758, 65.715
   JimMischel          6.32-6.32, 6.32     0.232-0.236, 0.233     1.208-1.254, 1.235
  JimMattyDSL          5.95-5.96, 5.96     0.264-0.269, 0.266     0.050-0.052, 0.051
   SimpleTrie    226.49-226.49, 226.49     0.932-0.962, 0.951     0.004-0.004, 0.004
CompessedTrie       16.10-16.10, 16.10     0.101-0.126, 0.111     0.003-0.003, 0.003
  MattyMerrix          6.15-6.15, 6.15     0.254-0.269, 0.259     0.414-0.418, 0.416

Matching 5000 words to dictionary of 1000000 terms of max length 25 将5000个单词与最大长度为1000000的词典匹配25

       Method              Memory (MB)         Build Time (s)        Lookup Time (s)
   JimMischel       57.69-57.69, 57.69     3.027-3.086, 3.052  16.341-16.415, 16.373
  JimMattyDSL       60.88-60.88, 60.88     3.396-3.484, 3.453     0.399-0.400, 0.399
   SimpleTrie 2124.57-2124.57, 2124.57  11.622-11.989, 11.860     0.006-0.006, 0.006
CompessedTrie    166.59-166.59, 166.59     2.813-2.832, 2.823     0.005-0.005, 0.005
  MattyMerrix       62.71-62.73, 62.72     3.230-3.270, 3.251     6.996-7.015, 7.008

As you can see, memory required for the (non-space optimized) tries is substantially higher. 如您所见,(非空间优化的)尝试所需的内存明显更高。 It increases by the size of the dictionary, O(N) for all of the tested implementations. 对于所有测试的实现,它都会增加字典的大小O(N)。

As expected, lookup time for the tries is more or less constant: O(k), dependent on the length of the search terms only. 不出所料,尝试的查找时间或多或少是恒定的:O(k),仅取决于搜索词的长度。 For the other implementations, time will increase based on the size of the dictionary to be searched. 对于其他实现,时间将基于要搜索的字典的大小而增加。

Note that far more optimal implementations for this problem can be constructed, that will be close to O(k) for search time and allow a more compact storage and reduced memory footprint. 请注意,可以构造出针对此问题的更为理想的实现,对于搜索时间,该实现将接近O(k),并允许更紧凑的存储和减少的内存占用。 If you map to a reduced alphabet (eg 'A'-'Z' only), this is also something that can be taken advantage of. 如果您映射到一个简化的字母(例如,仅'A'-'Z'),那么这也是可以利用的。

So you just want to find the words in the dictionary that are prefixes of the input string? 因此,您只想在字典中找到作为输入字符串前缀的单词? You can do this much more efficiently than any of the methods proposed. 您可以比建议的任何方法更加有效地执行此操作。 It's really just a modified merge. 它实际上只是一个修改的合并。

If your word list consists of a dictionary keyed by first letter, with each entry containing a sorted list of words that begin with that letter, then this will do it. 如果您的单词列表由以第一个字母为键的字典组成,并且每个条目都包含以该字母开头的单词的排序列表,则可以这样做。 Worst case is O(n + m), where n is the number of words that start with the letter, and m is the length of the input string. 最糟糕的情况是O(n + m),其中n是以字母开头的单词数,m是输入字符串的长度。

var inputString = "icegdt";
// get list of words that start with the first character
var wordsList = MyDictionary[input_string[0]];

// find all words that are prefixes of the input string
var iInput = 0;
var iWords = 0;
var prefix = inputString.Substring(0, iInput+1);
while (iInput < inputString.Length && iWords < wordsList.Count)
{
    if (wordsList[iWords] == prefix)
    {
        // wordsList[iWords] is found!
        ++iWords;
    }
    else if (wordsList[iWords] > prefix)
    {
        // The current word is alphabetically after the prefix.
        // So we need the next character.
        ++iInput;
        if (iInput < inputString.Length)
        {
            prefix = inputString.Substring(0, iInput+1);
        }
    }
    else
    {
        // The prefix is alphabetically after the current word.
        // Advance the current word.
        ++iWord;
    }
}

If this is all you want to do (find dictionary words that are prefixes of the input string), then there's no particular reason for your dictionary indexed by first character. 如果这是您要做的所有事情(查找作为输入字符串前缀的词典词),则没有特殊原因要使您的词典由第一个字符索引。 Given a sorted list of words, you could do a binary search on the first letter to find the starting point. 给定单词的排序列表,您可以对第一个字母进行二进制搜索以找到起点。 That would take slightly more time than the dictionary lookup, but the time difference would be very small compared to the time spent searching the word list for matches. 这将花费稍多的时间比字典查找,但比起花在寻找匹配的单词列表的时间的时间差将是非常小的。 In addition, the sorted word list would take less memory than the dictionary approach. 此外,与字典方法相比,排序的单词列表将占用更少的内存。

If you want to do case-insensitive comparisons, change the comparison code to: 如果要进行不区分大小写的比较,请将比较代码更改为:

    var result = String.Compare(wordsList[iWords], prefix, true);
    if (result == 0)
    {
        // wordsList[iWords] is found!
        ++iWords;
    }
    else if (result > 0)
    {

That also reduces the number of string comparisons per iteration to exactly one per iteration. 这也将每次迭代的字符串比较次数减少到每次迭代恰好一次。

while (x < str.Length-1)
{
    if (ChrW(10) == GetChar(str, x) && ChrW(13) == GetChar(str, x+1))
     {
       // x+2 - This new line
     }
   x++;
}

Here is my first go at it, wanted to get this out there in case I cant finish it today. 这是我的第一步,想把它拿出来,以防万一我今天不能完成。

 public class CompareHelper
 {
    //Should always be sorted in alphabetical order.
    public static Dictionary<char, List<string>> MyDictionary;
    public static List<string> CurrentWordList;
    public static List<string> MatchedWordList;

    //The word we are trying to find matches for.
    public static char InitChar;
    public static StringBuilder ThisWord;

    /// <summary>
    /// Initialize the Compare.  Set the first character.  See if there are any 1 letter words
    /// for that character.
    /// </summary>
    /// <param name="firstChar">The first character in the word string.</param>
    /// <returns>True if a word was found.</returns>
    public static bool InitCompare(char firstChar)
    {
        InitChar = firstChar;
        //Get all words that start with the firstChar.
        CurrentWordList = MyDictionary[InitChar];
        ThisWord = new StringBuilder();
        ThisWord.Append(firstChar);

        if (CurrentWordList[0].Length == 1)
        {
            //Match.
            return true;
        }
        //No matches.
        return false;
    }

    /// <summary>
    /// Append this letter to our ThisWord.  See if there are any matching words.
    /// </summary>
    /// <param name="nextChar">The next character in the word string.</param>
    /// <returns>True if a word was found.</returns>
    public static bool NextCompare(char nextChar)
    {
        ThisWord.Append(nextChar);
        int currentIndex = ThisWord.Length - 1;
        if (FindRemainingWords(nextChar, currentIndex))
        {
            if (CurrentWordList[0].Length == currentIndex)
            {
                //Match.
                return true;
            }
        }
        //No matches.
        return false;
    }

    /// <summary>
    /// Trim down our CurrentWordList until it only contains words
    /// that at currIndex start with the currChar.
    /// </summary>
    /// <param name="currChar">The next letter in our ThisWord.</param>
    /// <param name="currIndex">The index of the letter.</param>
    /// <returns>True if there are words remaining in CurrentWordList.</returns>
    private static bool FindRemainingWords(char currChar, int currIndex)
    {
        //Null check.
        if (CurrentWordList == null || CurrentWordList.Count < 1)
        {
            return false;
        }

        bool doneSearching = false;
        while(!doneSearching)
        {
            int middleIndex = CurrentWordList.Count / 2;

            //TODO: test for CurrentWordList.count 2 or 1 ...

            //TODO: test for wordToCheck.length < curr index

            char middleLetter = CurrentWordList[middleIndex][currIndex];


            LetterPositionEnum returnEnum = GetLetterPosition(currChar, middleLetter);
            switch(returnEnum)
            {
                case LetterPositionEnum.Before:
                    CurrentWordList = CurrentWordList.GetRange(middleIndex, (CurrentWordList.Count - middleIndex));
                    break;
                case LetterPositionEnum.PREV:
                    CurrentWordList = CurrentWordList.GetRange(middleIndex, (CurrentWordList.Count - middleIndex));

                    break;
                case LetterPositionEnum.MATCH:
                    CurrentWordList = CurrentWordList.GetRange(middleIndex, (CurrentWordList.Count - middleIndex));

                    break;
                case LetterPositionEnum.NEXT:
                    CurrentWordList = CurrentWordList.GetRange(0, middleIndex);

                    break;
                case LetterPositionEnum.After:
                    CurrentWordList = CurrentWordList.GetRange(0, middleIndex);

                    break;
                default:
                    break;
            }
        }

        TrimWords(currChar, currIndex);

        //Null check.
        if (CurrentWordList == null || CurrentWordList.Count < 1)
        {
            return false;
        }

        //There are still words left in CurrentWordList.
        return true;
    }

    //Trim all words in CurrentWordList 
    //that are LetterPositionEnum.PREV and LetterPositionEnum.NEXT
    private static void TrimWords(char currChar, int currIndex)
    {
        int startIndex = 0;
        int endIndex = CurrentWordList.Count;
        bool startIndexFound = false;

        //Loop through all of the words.
        for ( int i = startIndex; i < endIndex; i++)
        {
            //If we havent found the start index then the first match of currChar
            //will be the start index.
             if( !startIndexFound &&  currChar == CurrentWordList[i][currIndex] )
            {
                startIndex = i;
                startIndexFound = true;
            }

             //If we have found the start index then the next letter that isnt 
             //currChar will be the end index.
             if( startIndexFound && currChar != CurrentWordList[i][currIndex])
            {
                endIndex = i;
                break;
            }
        }

        //Trim the words that dont start with currChar.
        CurrentWordList = CurrentWordList.GetRange(startIndex, endIndex);
    }


    //In order to find all words that begin with a given character, we should search
    //for the last word that begins with the previous character (PREV) and the 
    //first word that begins with the next character (NEXT).
    //Anything else Before or After that is trash and we will throw out.
    public enum LetterPositionEnum
    {
        Before,
        PREV,
        MATCH,
        NEXT,
        After
    };

    //We want to ignore all letters that come before this one.
    public static LetterPositionEnum GetLetterPosition(char currChar, char compareLetter)
    {
        switch (currChar)
        {
            case 'A':
                switch (compareLetter)
                {
                    case 'A': return LetterPositionEnum.MATCH;
                    case 'B': return LetterPositionEnum.NEXT;
                    case 'C': return LetterPositionEnum.After;
                    case 'D': return LetterPositionEnum.After;
                    case 'E': return LetterPositionEnum.After;
                    case 'F': return LetterPositionEnum.After;
                    case 'G': return LetterPositionEnum.After;
                    case 'H': return LetterPositionEnum.After;
                    case 'I': return LetterPositionEnum.After;
                    case 'J': return LetterPositionEnum.After;
                    case 'K': return LetterPositionEnum.After;
                    case 'L': return LetterPositionEnum.After;
                    case 'M': return LetterPositionEnum.After;
                    case 'N': return LetterPositionEnum.After;
                    case 'O': return LetterPositionEnum.After;
                    case 'P': return LetterPositionEnum.After;
                    case 'Q': return LetterPositionEnum.After;
                    case 'R': return LetterPositionEnum.After;
                    case 'S': return LetterPositionEnum.After;
                    case 'T': return LetterPositionEnum.After;
                    case 'U': return LetterPositionEnum.After;
                    case 'V': return LetterPositionEnum.After;
                    case 'W': return LetterPositionEnum.After;
                    case 'X': return LetterPositionEnum.After;
                    case 'Y': return LetterPositionEnum.After;
                    case 'Z': return LetterPositionEnum.After;
                    default: return LetterPositionEnum.After;
                }
            case 'B':
                switch (compareLetter)
                {
                    case 'A': return LetterPositionEnum.PREV;
                    case 'B': return LetterPositionEnum.MATCH;
                    case 'C': return LetterPositionEnum.NEXT;
                    case 'D': return LetterPositionEnum.After;
                    case 'E': return LetterPositionEnum.After;
                    case 'F': return LetterPositionEnum.After;
                    case 'G': return LetterPositionEnum.After;
                    case 'H': return LetterPositionEnum.After;
                    case 'I': return LetterPositionEnum.After;
                    case 'J': return LetterPositionEnum.After;
                    case 'K': return LetterPositionEnum.After;
                    case 'L': return LetterPositionEnum.After;
                    case 'M': return LetterPositionEnum.After;
                    case 'N': return LetterPositionEnum.After;
                    case 'O': return LetterPositionEnum.After;
                    case 'P': return LetterPositionEnum.After;
                    case 'Q': return LetterPositionEnum.After;
                    case 'R': return LetterPositionEnum.After;
                    case 'S': return LetterPositionEnum.After;
                    case 'T': return LetterPositionEnum.After;
                    case 'U': return LetterPositionEnum.After;
                    case 'V': return LetterPositionEnum.After;
                    case 'W': return LetterPositionEnum.After;
                    case 'X': return LetterPositionEnum.After;
                    case 'Y': return LetterPositionEnum.After;
                    case 'Z': return LetterPositionEnum.After;
                    default: return LetterPositionEnum.After;
                }
            case 'C':
                switch (compareLetter)
                {
                    case 'A': return LetterPositionEnum.Before;
                    case 'B': return LetterPositionEnum.PREV;
                    case 'C': return LetterPositionEnum.MATCH;
                    case 'D': return LetterPositionEnum.NEXT;
                    case 'E': return LetterPositionEnum.After;
                    case 'F': return LetterPositionEnum.After;
                    case 'G': return LetterPositionEnum.After;
                    case 'H': return LetterPositionEnum.After;
                    case 'I': return LetterPositionEnum.After;
                    case 'J': return LetterPositionEnum.After;
                    case 'K': return LetterPositionEnum.After;
                    case 'L': return LetterPositionEnum.After;
                    case 'M': return LetterPositionEnum.After;
                    case 'N': return LetterPositionEnum.After;
                    case 'O': return LetterPositionEnum.After;
                    case 'P': return LetterPositionEnum.After;
                    case 'Q': return LetterPositionEnum.After;
                    case 'R': return LetterPositionEnum.After;
                    case 'S': return LetterPositionEnum.After;
                    case 'T': return LetterPositionEnum.After;
                    case 'U': return LetterPositionEnum.After;
                    case 'V': return LetterPositionEnum.After;
                    case 'W': return LetterPositionEnum.After;
                    case 'X': return LetterPositionEnum.After;
                    case 'Y': return LetterPositionEnum.After;
                    case 'Z': return LetterPositionEnum.After;
                    default: return LetterPositionEnum.After;
                }
//etc.  Stack Overflow limits characters to 30,000 contact me for full switch case.

   default: return LetterPositionEnum.After;
        }
    }
}

Ok here is the final solution I came up with, I am not sure if it is Optimal Optimal, but seems to be pretty darn fast and I like the logic and love the brevity of code. 好的,这是我想出的最终解决方案,我不确定这是否是Optimal Optimal,但似乎还算快,我喜欢逻辑并且喜欢代码的简洁。

Basically on App start up you pass in a List of words of any length to InitWords. 基本上在App启动时,您可以将任意长度的单词列表传递给InitWords。 This will sort the words and place them into a Dicitonary that has 26 keys, one for each Letter in the alphabet. 这将对单词进行排序,并将其放入具有26个键的词典中,每个字母对应一个字母。

Then during play, you will iterate through the character set, always starting with the first letter and then the first and second letter and so on. 然后在播放过程中,您将迭代字符集,始终从第一个字母开始,然后从第一个和第二个字母开始,依此类推。 The whole time you are trimming down the number of words in your CurrentWordList. 整个过程中,您都会减少CurrentWordList中的单词数。

So if you have the string 'icedgt'. 因此,如果您有字符串“ icedgt”。 You would call InitCompare with 'i', this would grab the KeyValuePair with Key 'I' from MyDictionary, then you will see if the first word is of length 1 since they are already in alphabetic order, the word 'I' would be the first word. 您将用'i'调用InitCompare,这将从MyDictionary中获取具有键'I'的KeyValuePair,然后您将看到第一个单词的长度是否为1,因为它们已经按字母顺序排列,所以单词'I'将是第一个字。 Then on your next iteration you pass in 'c' to NextCompare, this again reduces the List size by using Linq to only return words that have a second char of 'c'. 然后在下一次迭代中,将“ c”传递给NextCompare,这再次通过使用Linq仅返回具有第二个字符“ c”的单词来减小列表大小。 Then next you would do another NextCompare and pass in 'e', again reducing the number of words in CurrentWordList using Linq. 接下来,您将执行另一个NextCompare并传入'e',再次使用Linq减少CurrentWordList中的单词数。

So after the first iteration your CurrentWordList has every word that starts with 'i', on the NextCompare you will have every word that starts with 'ic' and on the NextCompare you will have a subset of that where every word starts with 'ice' and so on. 因此,在第一次迭代之后,您的CurrentWordList包含每个以'i'开头的单词,在NextCompare上,您将具有以'ic'开头的所有单词,在NextCompare上,您将具有其中每个单词以'ice'开头的子集等等。

I am not sure if Linq would have beat my manual gigantic Switch Case in terms of speed, but it is simple and elegant. 我不确定Linq是否会在速度上击败我的手动Switch Case,但它既简单又优雅。 And for that I am happy. 为此,我很高兴。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Xuzzle.Code
{
public class CompareHelper
{
    //Should always be sorted in alphabetical order.
    public static Dictionary<char, List<string>> MyDictionary;
    public static List<string> CurrentWordList;

    //The word we are trying to find matches for.
    public static char InitChar;
    public static StringBuilder ThisWord;

    /// <summary>
    /// Init MyDictionary with the list of words passed in.  Make a new
    /// key value pair with each Letter.
    /// </summary>
    /// <param name="listOfWords"></param>
    public static void InitWords(List<string> listOfWords)
    {
        MyDictionary = new Dictionary<char, List<string>>();
        foreach (char currChar in LetterHelper.Alphabet)
        {
            var wordsParsed = listOfWords.Where(currWord => char.ToUpper(currWord[0]) == currChar).ToArray();
            Array.Sort(wordsParsed);
            MyDictionary.Add(currChar, wordsParsed.ToList());
        }
    }

    /// <summary>
    /// Initialize the Compare.  Set the first character.  See if there are any 1 letter words
    /// for that character.
    /// </summary>
    /// <param name="firstChar">The first character in the word string.</param>
    /// <returns>True if a word was found.</returns>
    public static bool InitCompare(char firstChar)
    {
        InitChar = firstChar;
        //Get all words that start with the firstChar.
        CurrentWordList = MyDictionary[InitChar];
        ThisWord = new StringBuilder();
        ThisWord.Append(firstChar);

        if (CurrentWordList[0].Length == 1)
        {
            //Match.
            return true;
        }
        //No matches.
        return false;
    }

    /// <summary>
    /// Append this letter to our ThisWord.  See if there are any matching words.
    /// </summary>
    /// <param name="nextChar">The next character in the word string.</param>
    /// <returns>True if a word was found.</returns>
    public static bool NextCompare(char nextChar)
    {
        ThisWord.Append(nextChar);
        int currentIndex = ThisWord.Length - 1;
        if (CurrentWordList != null && CurrentWordList.Count > 0)
        {
            CurrentWordList = CurrentWordList.Where(word => (word.Length > currentIndex && word[currentIndex] == nextChar)).ToList();
            if (CurrentWordList != null && CurrentWordList.Count > 0)
            {
                if (CurrentWordList[0].Length == ThisWord.Length)
                {
                    //Match.
                    return true;
                }
            }
        }
        //No matches.
        return false;
    }
}
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM