簡體   English   中英

列表的出現 <string> 在一個字符串C#

[英]Occurrences of a List<string> in a string C#

特定

var stringList = new List<string>(new string[] {
                   "outage","restoration","efficiency"});

var queryText = "While walking through the park one day, I noticed an outage",
              "in the lightbulb at the plant. I talked to an officer about", 
              "restoration protocol for public works, and he said to contact",
              "the department of public works, but not to expect much because",
              "they have low efficiency."

如何從queryText獲取stringList中所有字符串的總出現次數?

在上面的例子中,我想要一個返回3的方法;

private int stringMatches (string textToQuery, string[] stringsToFind)
{
    //
}

結果

太快了!

進行了幾次性能測試,Fabian的這個代碼分支速度更快了:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    int count = 0;
    foreach (var stringToFind in stringsToFind)
    {
        int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
       currentIndex++;
       count++;
    }
    }
    return count;
}

執行時間:在使用秒表的10000次迭代循環中:

法比安:37-42毫秒

lazyberezovsky StringCompare:400-500毫秒

lazyberezovsky Regex:630-680毫秒

格倫:750-800毫秒

(為Fabians添加了StringComparison.Ordinal,以獲得更高的速度。)

那可能也很快:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
  int count = 0;
  foreach (var stringToFind in stringsToFind)
  {
    int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
     currentIndex++;
     count++;
    }
  }
  return count;
}

此LINQ查詢按空格和標點符號拆分文本,搜索匹配忽略大小寫

private int stringMatches(string textToQuery, string[] stringsToFind)
{
   StringComparer comparer = StringComparer.CurrentCultureIgnoreCase;
   return textToQuery.Split(new []{' ', '.', ',', '!', '?'}) // add more if need
                     .Count(w => stringsToFind.Contains(w, comparer));
}

或者使用正則表達式:

private static int stringMatches(string textToQuery, string[] stringsToFind)
{
    var pattern = String.Join("|", stringsToFind.Select(s => @"\b" + s + @"\b"));
    return Regex.Matches(textToQuery, pattern, RegexOptions.IgnoreCase).Count;
}

如果要計算字符串中其他集合中的單詞:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    return textToQuery.Split().Intersect(stringsToFind).Count();
}

我喜歡蒂姆的答案,但我盡量避免制作過多的字符串來避免性能問題,我確實喜歡正則表達式,所以這是另一種方法:

private int StringMatches(string searchMe, string[] keys)
{
    System.Text.RegularExpressions.Regex expression = new System.Text.RegularExpressions.Regex(string.Join("|", keys), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
    return expression.Matches(searchMe).Count;
}

這是對Fabian Bigler原始答案的修訂。 它的速度提升了大約33%,主要是因為StringComparison.Ordinal。

這里有更多信息的鏈接: http//msdn.microsoft.com/en-us/library/bb385972.aspx

    private int stringMatches(string textToQuery, List<string> stringsToFind)
    {
        int count = 0, stringCount = stringsToFind.Count(), currentIndex;
        string stringToFind;
        for (int i = 0; i < stringCount; i++)
        {
            currentIndex = 0;
            stringToFind = stringsToFind[i];
            while ((currentIndex = textToQuery.IndexOf(stringToFind, currentIndex, StringComparison.Ordinal)) != -1)
            {
                currentIndex++;
                count++;
            }
        }
        return count;
    }

這將只匹配TextToQuery的單詞

這樣做的想法是檢查匹配前后的索引是不是字母。 另外,我必須確保檢查它是否是字符串的開頭或結尾。

  private int stringMatchesWordsOnly(string textToQuery, string[] wordsToFind)
        {
            int count = 0;
            foreach (var wordToFind in wordsToFind)
            {
                int currentIndex = 0;
                while ((currentIndex = textToQuery.IndexOf(wordToFind, currentIndex,         StringComparison.Ordinal)) != -1)
                {
                    if (((currentIndex == 0) || //is it the first index?
                          (!Char.IsLetter(textToQuery, currentIndex - 1))) &&
                          ((currentIndex == (currentIndex + wordToFind.Length)) || //has the end been reached?
                          (!Char.IsLetter(textToQuery, currentIndex + wordToFind.Length))))
                    {
                        count++;
                    }
                    currentIndex++;
                }
            }
            return count;
        }

結論:正如您所看到的,這種方法比我的其他答案有點麻煩,並且性能較差(盡管比其他答案更高效)。 所以它真的取決於你想要達到的目標。 如果你的字符串中有簡短的單詞要找,你應該接受這個答案,因為例如'和'顯然會與第一種方法返回太多匹配。

private int stringMatches(string textToQuery, string[] stringsToFind)
{
      string[] splitArray = textToQuery.Split(new char[] { ' ', ',','.' });
      var count = splitArray.Where(p => stringsToFind.Contains(p)).ToArray().Count();
      return count;
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM