[英]Removing all standalone occurences of a word from a string with regular expressions in Java
Need advice on how to replace a sub-string like: @sometext , but not replace "@someothertext@somemail.com" sub-string. 需要有关如何替换子字符串的建议,例如: @sometext ,但不能替换“ @ someothertext @ somemail.com”子字符串。
For example, when I've got a string something like: 例如,当我有一个字符串时,例如:
An example with @sometext and also with "@someothertext@somemail.com" sometextafter 一个带有@sometext的示例,也带有一个“ @ someothertext @ somemail.com”
And the result, after replacing sub-strings in string above should look like: 结果,替换上面字符串中的子字符串后,结果应如下所示:
An example with and also with "@someothertext@somemail.com" sometextafter 一个示例,也带有“ @ someothertext @ somemail.com” sometextafter
After getting string from a field, I'm using: 从字段中获取字符串后,我正在使用:
String textMod = someText.replaceAll("( |^)[^\"]@[^@]+?( |$)","");
someText = textMod + "@\"" + someone.getEmail() + "\" ";
And then I'm setting this string into field. 然后,我将此字符串设置为字段。
You can do a regex on a standalone occurence this way 您可以通过这种方式对独立事件进行正则表达式
\b@sometext\b
Putting the \\b in front and in the back of the @sometext will make sure that it's a standalone word, not part of another word like @someothertext@sometext.com. 将\\ b放在@sometext的前面和后面将确保它是一个独立的单词,而不是其他单词(如@ someothertext @ sometext.com)的一部分。 Then if it's found the result will be put inside $match, now you can do whatever you want with $match 然后,如果发现结果将放入$ match中,现在您可以使用$ match做任何您想做的事
Hope this helps 希望这可以帮助
From https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html 从https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html
The \\b in the pattern indicates a word boundary, so only the distinct * word "web" is matched, and not a word partial like "webbing" or "cobweb" 模式中的\\ b表示单词边界,因此仅匹配*单词“ web”,而不匹配部分单词,例如“ webbing”或“ cobweb”
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
}
^ PHP example but you get the point ^ PHP示例,但您明白了这一点
如果在标签之前和之后始终有一个要替换的空间,这可能就足够了。
/\s(@\w+)\s/g
这应符合您的需求:
str = str.replaceAll("@\w+[^@]", "");
Try this 尝试这个
(?<!\w)@[^@\s]+(?!\S)
See it here on Regexr 在Regexr上看到它
Match on a @
but only if there is no word character \\w
before (?<!\\w)
. 在@
上匹配,但前提是(?<!\\w)
之前没有单词字符\\w
。 Then match a sequence of characters that are not @
and not whitespace \\s
but only if its not followed by a non whitespace \\S
然后匹配不是@
且不是空格\\s
的字符序列,但前提是字符后面没有非空格\\S
(?<!\\w)
is called a negative lookbehind assertion (?<!\\w)
被称为否定式后向断言
[^@\\s]
is called a negated character class , means match anything that is not part of the class [^@\\s]
被称为否定字符类 ,意味着匹配不属于该类的任何字符
(?!\\S)
is a negative lookahead assertion (?!\\S)
是否定的超前断言
Simply adding spaces before and after "@sometext" would not work if "@sometext" is at the start or end of a sentence. 如果“ @sometext”在句子的开头或结尾处,则仅在“ @sometext”之前和之后添加空格是行不通的。 However, just adding a pattern checking for start or end of sentence would not work either, as when you match "@sometext " at the start of a sentence and leave a space " ", this will make the resulting string look strange. 但是,仅添加用于检查句子开头或结尾的模式也不起作用,因为当您在句子开头匹配“ @sometext”并留有空格“”时,这会使生成的字符串看起来很奇怪。 Same goes for the end of a sentence. 句子结尾也一样。
We need to split the regex replace in to two actions, and perform two seperate regex replaces: 我们需要将正则表达式替换分为两个动作,并执行两个单独的正则表达式替换:
str = str.replaceAll(" @sometext ", " ");
str = str.replaceAll("^@sometext | @sometext$|(?:@sometext ){2,}", "");
^
means start of line, $
means end of line. ^
表示行的开始, $
表示行的结束。
EDIT: Added corner case handling of when several @sometext's are after each other. 编辑:增加了几个@sometext彼此接连时的特殊情况处理。
(c#, regex based) (C#,基于正则表达式)
//match @xxx sequences, but only if i can look back and NOT see a @xxx immediately preceding me, and if I don't end with a @
string input = @"[An example with @hello and also with ""@@hello@somemail.com"" sometext @lastone";
var pattern = @"(?<!@\w+)(?>@\w+)(?!@)";
var matches = Regex.Matches(input, pattern);
myString = myString.replaceAll(" @hello ", " ");
If @hello
is a single word, then it has spaces before and after, right? 如果@hello
是一个单词,那么它前后都有空格,对吗? So you should find all @hello
s with space before and after and replace it with a space. 因此,您应该查找所有@hello
之前和之后的空格,并将其替换为空格。
If you need to remove not only @hello
s and all words which are starting with @
and not containing other @
, use this: 如果您不仅需要删除@hello
以及所有以@
开头但不包含其他@
单词,请使用以下命令:
myString = myString.replaceAll(" @[^@]+? ", " ");
[^@]
is any symbol except @
. [^@]
是@
以外的任何符号。 +?
means match at least one character until reaching the first space. 表示匹配至少一个字符,直到到达第一个空格。
If you want to remove words with only alphanumeric characters, use \\\\w
instead of [^@]
如果要删除仅包含字母数字字符的单词,请使用\\\\w
而不是[^@]
EDIT: 编辑:
Yeah, ohaal's right. 是的,奥哈尔(Ohaal)是正确的。 To make it match at the start and the end of string use this pattern: 要使其在字符串的开头和结尾匹配,请使用以下模式:
( |^)@[^@]+?( |$)
myString = myString.replaceAll("( |^)@hello( |$)", " ");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.