简体   繁体   English

如何计算字符串中干扰词的数量?

[英]How do I count the number of noise words in a string?

Suppose I have a list of noise words... 假设我有一个干扰词列表...

string[] noise = new[] {"and", "it", "in"}; // etc, etc

...and I have a string s . ...而且我有一个字符串s I want to know how many noise words exist in s . 我想知道s多少个干扰词。

I know can do this by splitting s on spaces, then looping through the resultant array checking for matches in noise , but this seems like a very inefficient way of doing it. 我知道可以通过在空间上分割s ,然后循环遍历所得数组检查是否存在noise来做到这一点,但这似乎是一种效率很低的方法。 It feels like there ought to be a neat RegEx or Linq way to do it. 感觉应该有一种整齐的RegEx或Linq方式来做到这一点。

Any suggestions? 有什么建议么?

LINQ isn't more efficient than a loop but often more readable and concise and i guess that's what you wanted. LINQ并不比循环更有效,但通常更易读和简洁,我想这就是您想要的。 In this case you can use Enumerable.Count and Contains : 在这种情况下,您可以使用Enumerable.CountContains

int countNoiseWords = s.Split().Count(noise.Contains);

The case-insensitive way: 不区分大小写的方式:

int countNoiseWords = s.Split()
    .Count(w => noise.Contains(w, StringComparer.InvariantCultureIgnoreCase));

If the noise -list is very long you should consider to use a HashSet<string> instead of a list. 如果noise -list很长,则应考虑使用HashSet<string>而不是列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM