[英]How can I compare a string to a “filter” list in linq?
I'm trying to filter a collection of strings by a "filter" list... a list of bad words. 我正在尝试通过“过滤器”列表过滤字符串集合...一系列不良单词。 The string contains a word from the list I dont want it.
该字符串包含我不想要的列表中的单词。
I've gotten so far, the bad Word here is "frakk": 我到目前为止,这里的坏词是“frakk”:
string[] filter = { "bad", "words", "frakk" };
string[] foo =
{
"this is a lol string that is allowed",
"this is another lol frakk string that is not allowed!"
};
var items = from item in foo
where (item.IndexOf( (from f in filter select f).ToString() ) == 0)
select item;
But this aint working, why? 但这不起作用,为什么呢?
You can use Any
+ Contains
: 您可以使用
Any
+ Contains
:
var items = foo.Where(s => !filter.Any(w => s.Contains(w)));
if you want to compare case-insensitively: 如果你想比较不区分大小写:
var items = foo.Where(s => !filter.Any(w => s.IndexOf(w, StringComparison.OrdinalIgnoreCase) >= 0));
Update : If you want to exlude sentences where at least one word is in the filter-list you can use String.Split()
and Enumerable.Intersect
: 更新 :如果要
String.Split()
过滤列表中至少有一个单词的句子,可以使用String.Split()
和Enumerable.Intersect
:
var items = foo.Where(sentence => !sentence.Split().Intersect(filter).Any());
Enumerable.Intersect
is very efficient since it uses a Set
under the hood. Enumerable.Intersect
非常有效,因为它在引擎盖下使用了Set
。 it's more efficient to put the long sequence first. 将长序列放在首位更有效。 Due to Linq's deferred execution is stops on the first matching word.
由于Linq的延迟执行是在第一个匹配的单词上停止。
( note that the "empty" Split
includes other white-space characters like tab or newline ) (请注意,“空”
Split
包括其他空白字符,如制表符或换行符)
The first problem you need to solve is breaking up the sentence into a series of words. 你需要解决的第一个问题是将句子分成一系列单词。 The simplest way to do this is based on spaces
最简单的方法是基于空格
string[] words = sentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
From there you can use a simple LINQ expression to find the profanities 从那里你可以使用一个简单的LINQ表达式来找到亵渎
var badWords = words.Where(x => filter.Contains(x));
However this is a bit of a primitive solution. 然而,这是一个原始的解决方案。 It won't handle a number of complex cases that you likely need to think about
它不会处理您可能需要考虑的许多复杂情况
' '
' '
dog!
dog!
won't be viewed as dog
. dog
。 Probably much better to break up words on legal characters The reason your initial attempt didn't work is that this line: 你的初步尝试不起作用的原因是这一行:
(from f in filter select f).ToString()
evaluates to a string of the Array Iterator type name that's implied by the linq expression portion. 求值为linq表达式部分隐含的Array Iterator类型名称的字符串。 So you're actually comparing the characters of the following string:
所以你实际上是在比较以下字符串的字符:
System.Linq.Enumerable+WhereSelectArrayIterator``2[System.String,System.String]
rather than the words of the filter when examining your phrases. 而不是在检查你的短语时过滤器的话。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.