繁体   English   中英

列表的出现 <string> 在一个字符串C#

[英]Occurrences of a List<string> in a string C#

特定

var stringList = new List<string>(new string[] {
                   "outage","restoration","efficiency"});

var queryText = "While walking through the park one day, I noticed an outage",
              "in the lightbulb at the plant. I talked to an officer about", 
              "restoration protocol for public works, and he said to contact",
              "the department of public works, but not to expect much because",
              "they have low efficiency."

如何从queryText获取stringList中所有字符串的总出现次数?

在上面的例子中,我想要一个返回3的方法;

private int stringMatches (string textToQuery, string[] stringsToFind)
{
    //
}

结果

太快了!

进行了几次性能测试,Fabian的这个代码分支速度更快了:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    int count = 0;
    foreach (var stringToFind in stringsToFind)
    {
        int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
       currentIndex++;
       count++;
    }
    }
    return count;
}

执行时间:在使用秒表的10000次迭代循环中:

法比安:37-42毫秒

lazyberezovsky StringCompare:400-500毫秒

lazyberezovsky Regex:630-680毫秒

格伦:750-800毫秒

(为Fabians添加了StringComparison.Ordinal,以获得更高的速度。)

那可能也很快:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
  int count = 0;
  foreach (var stringToFind in stringsToFind)
  {
    int currentIndex = 0;

    while ((currentIndex = textToQuery.IndexOf(stringToFind , currentIndex, StringComparison.Ordinal)) != -1)
    {
     currentIndex++;
     count++;
    }
  }
  return count;
}

此LINQ查询按空格和标点符号拆分文本,搜索匹配忽略大小写

private int stringMatches(string textToQuery, string[] stringsToFind)
{
   StringComparer comparer = StringComparer.CurrentCultureIgnoreCase;
   return textToQuery.Split(new []{' ', '.', ',', '!', '?'}) // add more if need
                     .Count(w => stringsToFind.Contains(w, comparer));
}

或者使用正则表达式:

private static int stringMatches(string textToQuery, string[] stringsToFind)
{
    var pattern = String.Join("|", stringsToFind.Select(s => @"\b" + s + @"\b"));
    return Regex.Matches(textToQuery, pattern, RegexOptions.IgnoreCase).Count;
}

如果要计算字符串中其他集合中的单词:

private int stringMatches(string textToQuery, string[] stringsToFind)
{
    return textToQuery.Split().Intersect(stringsToFind).Count();
}

我喜欢蒂姆的答案,但我尽量避免制作过多的字符串来避免性能问题,我确实喜欢正则表达式,所以这是另一种方法:

private int StringMatches(string searchMe, string[] keys)
{
    System.Text.RegularExpressions.Regex expression = new System.Text.RegularExpressions.Regex(string.Join("|", keys), System.Text.RegularExpressions.RegexOptions.IgnoreCase);
    return expression.Matches(searchMe).Count;
}

这是对Fabian Bigler原始答案的修订。 它的速度提升了大约33%,主要是因为StringComparison.Ordinal。

这里有更多信息的链接: http//msdn.microsoft.com/en-us/library/bb385972.aspx

    private int stringMatches(string textToQuery, List<string> stringsToFind)
    {
        int count = 0, stringCount = stringsToFind.Count(), currentIndex;
        string stringToFind;
        for (int i = 0; i < stringCount; i++)
        {
            currentIndex = 0;
            stringToFind = stringsToFind[i];
            while ((currentIndex = textToQuery.IndexOf(stringToFind, currentIndex, StringComparison.Ordinal)) != -1)
            {
                currentIndex++;
                count++;
            }
        }
        return count;
    }

这将只匹配TextToQuery的单词

这样做的想法是检查匹配前后的索引是不是字母。 另外,我必须确保检查它是否是字符串的开头或结尾。

  private int stringMatchesWordsOnly(string textToQuery, string[] wordsToFind)
        {
            int count = 0;
            foreach (var wordToFind in wordsToFind)
            {
                int currentIndex = 0;
                while ((currentIndex = textToQuery.IndexOf(wordToFind, currentIndex,         StringComparison.Ordinal)) != -1)
                {
                    if (((currentIndex == 0) || //is it the first index?
                          (!Char.IsLetter(textToQuery, currentIndex - 1))) &&
                          ((currentIndex == (currentIndex + wordToFind.Length)) || //has the end been reached?
                          (!Char.IsLetter(textToQuery, currentIndex + wordToFind.Length))))
                    {
                        count++;
                    }
                    currentIndex++;
                }
            }
            return count;
        }

结论:正如您所看到的,这种方法比我的其他答案有点麻烦,并且性能较差(尽管比其他答案更高效)。 所以它真的取决于你想要达到的目标。 如果你的字符串中有简短的单词要找,你应该接受这个答案,因为例如'和'显然会与第一种方法返回太多匹配。

private int stringMatches(string textToQuery, string[] stringsToFind)
{
      string[] splitArray = textToQuery.Split(new char[] { ' ', ',','.' });
      var count = splitArray.Where(p => stringsToFind.Contains(p)).ToArray().Count();
      return count;
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM