Java＆Regex：匹配不在特定字符前面的子字符串

Question

This is one of those questions that has been asked and answered hundreds of times over, but I'm having a hard time adapting other solutions to my needs. 这是已被问过并回答了数百次的问题之一，但我很难根据自己的需要调整其他解决方案。

In my Java-application I have a method for censoring bad words in chat messages. 在我的Java应用程序中，我有一种方法可以检查聊天消息中的坏词。 It works for most of my words, but there is one particular (and popular) curse word that I can't seem to get rid of. 它适用于我的大多数单词，但有一个特殊的（和流行的）诅咒词我似乎无法摆脱。 The word is "faen" (which is simply a modern slang for "satan", in the language in question). 这个词是“faen”（在所讨论的语言中，它只是“撒旦”的现代俚语）。

Using the pattern "fa+e+n" for matching multiple A's and E's actually works; 使用模式“fa + e + n”来匹配多个A和E实际上是有效的; however, in this language, the word for "that couch" or "that sofa" is "sofaen". 然而，在这种语言中，“沙发”或“沙发”这个词是“沙发”。 I've tried a lot of different approaches, using variations of [^so] and (?!=so), but so far I haven't been able to find a way to match one and not the other. 我已经尝试了很多不同的方法，使用[^ so]和（？！= so）的变体，但到目前为止，我还没有找到匹配一个而不是另一个的方法。

The real goal here, is to be able to match the bad words, regardless of the number of vowels, and regardless of any non-letters in between the components of the word. 这里的真正目标是能够匹配坏词，无论元音的数量如何，并且无论词的组成部分之间是否有任何非字母。

Here's a few examples of what I'm trying to do: 以下是我正在尝试做的几个例子：

"String containing faen"                        Should match
"String containing sofaen"                      Should not match
"Non-letter-censored string with f-a@a-e.n"     Should match
"Non-letter-censored string with sof-a@a-e.n"   Should not match

Any tips to set me off in the right direction on this? 有什么建议可以让我朝着正确的方向前进吗？

Answer 1

You want something like \\bf[^\\s]+a[^\\s]+e[^\\s]+n[^\\s]\\b . 你想要\\bf[^\\s]+a[^\\s]+e[^\\s]+n[^\\s]\\b 。 Note that this is the regular expression; 请注意，这是正则表达式; if you want the Java then you need to use \\\\b[^\\\\s]+f[^\\\\s]+a[^\\\\s]+e[^\\\\s]+n[^\\\\s]\\b . 如果你想要Java那么你需要使用\\\\b[^\\\\s]+f[^\\\\s]+a[^\\\\s]+e[^\\\\s]+n[^\\\\s]\\b 。

Note also that this isn't perfect, but does handle the situations that you have suggested. 另请注意，这并不完美，但可以处理您建议的情况。

Answer 2

It's a terrible idea to begin with. 一开始这是一个糟糕的主意。 You think, your users would write something like "f-aeen" to avoid your filter but would not come up with "ffaen" or "-faen" or whatever variation that you did not prepare for? 你认为，你的用户会写一些像“f-aeen”这样的东西来避免你的过滤器，但不会想出“ffaen”或“-faen”或者你没有准备的任何变化？ This is a race you cannot win and the real loser is usability. 这是一场你无法获胜的比赛，真正的输家就是可用性。

Java＆Regex：匹配不在特定字符前面的子字符串

问题描述

2 个解决方案

解决方案1
2 2013-02-12 08:51:16

解决方案2
1 2013-02-12 08:53:08

Java＆Regex：匹配不在特定字符前面的子字符串

问题描述

2 个解决方案

解决方案1 2 2013-02-12 08:51:16

解决方案2 1 2013-02-12 08:53:08

解决方案1
2 2013-02-12 08:51:16

解决方案2
1 2013-02-12 08:53:08