简体   繁体   English

在C#中解析“坏”字的字符串的最佳方法是什么?

[英]What's the best way to parse a string for “bad” words in C#?

I'm thinking of something like: 我想的是:

foreach (var word in paragraph.split(' ')) {
  if (badWordArray.Contains(word) {
    // do something about it
  }
}

but I'm sure there's a better way. 但我确信有更好的方法。

Thanks in advance! 提前致谢!

UPDATE I'm not looking to remove obscenities automatically... for my web app, I want to be notified if a word I deem "bad" is used. 更新我不打算自动删除猥亵...对于我的网络应用程序,如果使用我认为“坏”的单词,我希望收到通知。 Then I'll review it myself to make sure it's legit. 然后我会自己检查以确保它是合法的。 An auto flagging system of sorts. 一种自动标记系统。

While your way works, it may be a bit time consuming. 虽然你的方式有效,但可能有点耗费时间。 There is a wonderful response here for a previous SO question. 对于之前的SO问题, 这里有一个很好的回答 Though the question talks about PHP instead of C#, I think it can be easily ported. 虽然问题是谈论PHP而不是C#,但我认为它可以轻松移植。

Edit to add sample code: 编辑以添加示例代码:

public string FilterWords(string inputWords) {
    Regex wordFilter = new Regex("(puppies|kittens|dolphins|crabs)");
    return wordFilter.Replace(inputWords, "<3");
}

That should work for you, more or less. 这应该或多或少对你有用。

Edit to answer OP clarification: 编辑以回答OP澄清:

I'm not looking to remove obscenities automatically... for my web app, I want to be notified if a word I deem "bad" is used. 我不打算自动删除猥亵...对于我的网络应用程序,如果使用我认为“坏”的单词,我希望收到通知。

Much as the replacement portion above, you can see if something matches like so: 就像上面的替换部分一样,你可以看到是否匹配如此:

public bool HasBadWords(string inputWords) {
    Regex wordFilter = new Regex("(puppies|kittens|dolphins|crabs)");
    return wordFilter.IsMatch(inputWords);
}

It will return true if the string you passed to it contains any words in the list. 如果传递给它的字符串包含列表中的任何单词,它将返回true

At my job we put some automatic bad word filtering into our software (it's kind of shocking to be browsing the source and suddenly run across the array containing several pages of obscenity). 在我的工作中,我们在我们的软件中添加了一些自动坏词过滤(浏览源代码并突然在包含几页淫秽内容的数组中运行时有点令人震惊)。

One tip is to pre-process the user input before testing against your list, in that case that someone is trying to sneak something by you. 一个提示是在对您的列表进行测试之前预先处理用户输入,在这种情况下,有人试图偷偷摸摸您。 So by way of preprocessing, we 所以通过预处理,我们

  • uppercase everything in the input 大写输入中的所有内容
  • remove most non-alphanumerics (that is, just splice out any spaces, or punctuation, etc.) 删除大多数非字母数字(即,只拼出任何空格或标点符号等)
  • and then assuming someone is trying to pass off digits for letters, do the something like this: replace zero with O, 9 with G, 5 with S, etc. (get creative) 然后假设有人试图为字母传递数字,做这样的事情:用O替换零,用G替换9,用S替换5等等(获得创造性)

And then get some friends to try to break it. 然后让一些朋友尝试打破它。 It's fun. 好有趣。

You could consider using the HashKey objects or Dictionary<T1, T2 > instead of the array as using a Dictionary for example can make code more efficient, because the .Contains() method becomes .Keys.Contains() which is way more efficient. 您可以考虑使用HashKey对象或Dictionary<T1, T2 >而不是数组,因为使用Dictionary可以使代码更有效,因为.Contains()方法变得更有效.Keys.Contains()。 This is especially true if you have a large list of profanities (not sure how many there are! :) 如果你有大量的亵渎行为(不确定有多少!),情况尤其如此

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM