简体   繁体   English

使用Java中的正则表达式从字符串中删除单词的所有独立出现

[英]Removing all standalone occurences of a word from a string with regular expressions in Java

Need advice on how to replace a sub-string like: @sometext , but not replace "@someothertext@somemail.com" sub-string. 需要有关如何替换子字符串的建议,例如: @sometext ,但不能替换“ @ someothertext @ somemail.com”子字符串。

For example, when I've got a string something like: 例如,当我有一个字符串时,例如:

An example with @sometext and also with "@someothertext@somemail.com" sometextafter 一个带有@sometext的示例,也带有一个“ @ someothertext @ somemail.com”

And the result, after replacing sub-strings in string above should look like: 结果,替换上面字符串中的子字符串后,结果应如下所示:

An example with and also with "@someothertext@somemail.com" sometextafter 一个示例,也带有“ @ someothertext @ somemail.com” sometextafter

After getting string from a field, I'm using: 从字段中获取字符串后,我正在使用:

String textMod = someText.replaceAll("( |^)[^\"]@[^@]+?( |$)","");
someText = textMod + "@\"" + someone.getEmail() + "\" ";

And then I'm setting this string into field. 然后,我将此字符串设置为字段。

You can do a regex on a standalone occurence this way 您可以通过这种方式对独立事件进行正则表达式

\b@sometext\b

Putting the \\b in front and in the back of the @sometext will make sure that it's a standalone word, not part of another word like @someothertext@sometext.com. 将\\ b放在@sometext的前面和后面将确保它是一个独立的单词,而不是其他单词(如@ someothertext @ sometext.com)的一部分。 Then if it's found the result will be put inside $match, now you can do whatever you want with $match 然后,如果发现结果将放入$ match中,现在您可以使用$ match做任何您想做的事

Hope this helps 希望这可以帮助

From https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html

The \\b in the pattern indicates a word boundary, so only the distinct * word "web" is matched, and not a word partial like "webbing" or "cobweb" 模式中的\\ b表示单词边界,因此仅匹配*单词“ web”,而不匹配部分单词,例如“ webbing”或“ cobweb”

if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice."))      {
    echo "A match was found.";
    }

^ PHP example but you get the point ^ PHP示例,但您明白了这一点

如果在标签之前和之后始终有一个要替换的空间,这可能就足够了。

/\s(@\w+)\s/g

这应符合您的需求:

str = str.replaceAll("@\w+[^@]", "");

Try this 尝试这个

(?<!\w)@[^@\s]+(?!\S)

See it here on Regexr 在Regexr上看到它

Match on a @ but only if there is no word character \\w before (?<!\\w) . @上匹配,但前提是(?<!\\w)之前没有单词字符\\w Then match a sequence of characters that are not @ and not whitespace \\s but only if its not followed by a non whitespace \\S 然后匹配不是@且不是空格\\s的字符序列,但前提是字符后面没有非空格\\S

(?<!\\w) is called a negative lookbehind assertion (?<!\\w)被称为否定式后向断言

[^@\\s] is called a negated character class , means match anything that is not part of the class [^@\\s]被称为否定字符类 ,意味着匹配不属于该类的任何字符

(?!\\S) is a negative lookahead assertion (?!\\S)否定的超前断言

Simply adding spaces before and after "@sometext" would not work if "@sometext" is at the start or end of a sentence. 如果“ @sometext”在句子的开头或结尾处,则仅在“ @sometext”之前和之后添加空格是行不通的。 However, just adding a pattern checking for start or end of sentence would not work either, as when you match "@sometext " at the start of a sentence and leave a space " ", this will make the resulting string look strange. 但是,仅添加用于检查句子开头或结尾的模式也不起作用,因为当您在句子开头匹配“ @sometext”并留有空格“”时,这会使生成的字符串看起来很奇怪。 Same goes for the end of a sentence. 句子结尾也一样。

We need to split the regex replace in to two actions, and perform two seperate regex replaces: 我们需要将正则表达式替换分为两个动作,并执行两个单独的正则表达式替换:

str = str.replaceAll(" @sometext ", " ");
str = str.replaceAll("^@sometext | @sometext$|(?:@sometext ){2,}", "");

^ means start of line, $ means end of line. ^表示行的开始, $表示行的结束。

EDIT: Added corner case handling of when several @sometext's are after each other. 编辑:增加了几个@sometext彼此接连时的特殊情况处理。

(c#, regex based) (C#,基于正则表达式)

//match @xxx sequences, but only if i can look back and NOT see a @xxx immediately preceding me, and if I don't end with a @
string input = @"[An example with @hello and also with ""@@hello@somemail.com"" sometext @lastone";
 var pattern = @"(?<!@\w+)(?>@\w+)(?!@)";
 var matches = Regex.Matches(input, pattern);

myString = myString.replaceAll(" @hello ", " ");

If @hello is a single word, then it has spaces before and after, right? 如果@hello是一个单词,那么它前后都有空格,对吗? So you should find all @hello s with space before and after and replace it with a space. 因此,您应该查找所有@hello之前和之后的空格,并将其替换为空格。

If you need to remove not only @hello s and all words which are starting with @ and not containing other @ , use this: 如果您不仅需要删除@hello以及所有以@开头但不包含其他@单词,请使用以下命令:

myString = myString.replaceAll(" @[^@]+? ", " ");

[^@] is any symbol except @ . [^@]@以外的任何符号。 +? means match at least one character until reaching the first space. 表示匹配至少一个字符,直到到达第一个空格。

If you want to remove words with only alphanumeric characters, use \\\\w instead of [^@] 如果要删除仅包含字母数字字符的单词,请使用\\\\w而不是[^@]

EDIT: 编辑:

Yeah, ohaal's right. 是的,奥哈尔(Ohaal)是正确的。 To make it match at the start and the end of string use this pattern: 要使其在字符串的开头和结尾匹配,请使用以下模式:

( |^)@[^@]+?( |$)

myString = myString.replaceAll("( |^)@hello( |$)", " ");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM