简体   繁体   English

如何将字符串与linq中的“过滤器”列表进行比较?

[英]How can I compare a string to a “filter” list in linq?

I'm trying to filter a collection of strings by a "filter" list... a list of bad words. 我正在尝试通过“过滤器”列表过滤字符串集合...一系列不良单词。 The string contains a word from the list I dont want it. 该字符串包含我不想要的列表中的单词。

I've gotten so far, the bad Word here is "frakk": 我到目前为止,这里的坏词是“frakk”:

string[] filter = { "bad", "words", "frakk" };

string[] foo = 
{ 
    "this is a lol string that is allowed", 
    "this is another lol frakk string that is not allowed!"
};

var items = from item in foo 
            where (item.IndexOf( (from f in filter select f).ToString() ) == 0)
            select item;

But this aint working, why? 但这不起作用,为什么呢?

You can use Any + Contains : 您可以使用Any + Contains

var items = foo.Where(s => !filter.Any(w => s.Contains(w)));

if you want to compare case-insensitively: 如果你想比较不区分大小写:

var items = foo.Where(s => !filter.Any(w => s.IndexOf(w, StringComparison.OrdinalIgnoreCase) >= 0));

Update : If you want to exlude sentences where at least one word is in the filter-list you can use String.Split() and Enumerable.Intersect : 更新 :如果要String.Split()过滤列表中至少有一个单词的句子,可以使用String.Split()Enumerable.Intersect

var items = foo.Where(sentence => !sentence.Split().Intersect(filter).Any());

Enumerable.Intersect is very efficient since it uses a Set under the hood. Enumerable.Intersect非常有效,因为它在引擎盖下使用了Set it's more efficient to put the long sequence first. 将长序列放在首位更有效。 Due to Linq's deferred execution is stops on the first matching word. 由于Linq的延迟执行是在第一个匹配的单词上停止。

( note that the "empty" Split includes other white-space characters like tab or newline ) (请注意,“空” Split包括其他空白字符,如制表符或换行符)

The first problem you need to solve is breaking up the sentence into a series of words. 你需要解决的第一个问题是将句子分成一系列单词。 The simplest way to do this is based on spaces 最简单的方法是基于空格

string[] words = sentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);

From there you can use a simple LINQ expression to find the profanities 从那里你可以使用一个简单的LINQ表达式来找到亵渎

var badWords = words.Where(x => filter.Contains(x));

However this is a bit of a primitive solution. 然而,这是一个原始的解决方案。 It won't handle a number of complex cases that you likely need to think about 它不会处理您可能需要考虑的许多复杂情况

  • There are many characters which qualify as a space. 有许多角色可以作为空间。 My solution only uses ' ' 我的解决方案只使用' '
  • The split doesn't handle punctuations. 分裂不处理标点符号。 So dog! 所以dog! won't be viewed as dog . 不会被视为dog Probably much better to break up words on legal characters 打破法律角色的话可能要好得多

The reason your initial attempt didn't work is that this line: 你的初步尝试不起作用的原因是这一行:

(from f in filter select f).ToString()

evaluates to a string of the Array Iterator type name that's implied by the linq expression portion. 求值为linq表达式部分隐含的Array Iterator类型名称的字符串。 So you're actually comparing the characters of the following string: 所以你实际上是在比较以下字符串的字符:

System.Linq.Enumerable+WhereSelectArrayIterator``2[System.String,System.String]

rather than the words of the filter when examining your phrases. 而不是在检查你的短语时过滤器的话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM