[英]Regex to replace a repeating string pattern
I need to replace a repeated pattern within a word with each basic construct unit. 我需要用每个基本构造单元替换单词中的重复模式。 For example I have the string "TATATATA" and I want to replace it with "TA".
例如,我有字符串“TATATATA”,我想用“TA”替换它。 Also I would probably replace more than 2 repetitions to avoid replacing normal words.
此外,我可能会更换超过2次重复,以避免替换正常的单词。
I am trying to do it in Java with replaceAll method. 我试图用Java替换所有方法。
I think you want this (works for any length of the repeated string): 我想你想要这个(适用于任何长度的重复字符串):
String result = source.replaceAll("(.+)\\1+", "$1")
Or alternatively, to prioritize shorter matches: 或者,优先考虑较短的匹配:
String result = source.replaceAll("(.+?)\\1+", "$1")
It matches first a group of letters, and then it again (using back-reference within the match pattern itself). 它首先匹配一组字母,然后再次匹配(在匹配模式本身中使用反向引用)。 I tried it and it seems to do the trick.
我尝试了它,它似乎做了伎俩。
Example 例
String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";
System.out.println(source.replaceAll("(.+?)\\1+", "$1"));
// HEY dude what's up? Trolo ye .0
You had better use a Pattern
here than .replaceAll()
. 你最好在这里使用一个
Pattern
不是.replaceAll()
。 For instance: 例如:
private static final Pattern PATTERN
= Pattern.compile("\\b([A-Z]{2,}?)\\1+\\b");
//...
final Matcher m = PATTERN.matcher(input);
ret = m.replaceAll("$1");
edit: example: 编辑:示例:
public static void main(final String... args)
{
System.out.println("TATATA GHRGHRGHRGHR"
.replaceAll("\\b([A-Za-z]{2,}?)\\1+\\b", "$1"));
}
This prints: 这打印:
TA GHR
Since you asked for a regex solution: 既然你要求一个正则表达式解决方案:
(\\w)(\\w)(\\1\\2){2,};
(\\w)(\\w)
: matches every pair of consecutive word characters ( (.)(.)
will catch every consecutive pair of characters of any type), storing them in capturing groups 1 and 2. (\\\\1\\\\2)
matches anytime the characters in those groups are repeated again immediately afterward, and {2,}
matches when it repeats two or more times ( {2,10}
would match when it repeats more than one but less than ten times). (\\w)(\\w)
:匹配每对连续的单词字符( (.)(.)
将捕获任何类型的每个连续字符对),将它们存储在捕获组 1和2中。 (\\\\1\\\\2)
随后立即再次重复这些组中的字符匹配,并且当它重复两次或更多次时{2,}
匹配(当重复多于一次但少于十次时{2,10}
匹配)。
String s = "hello TATATATA world";
Pattern p = Pattern.compile("(\\w)(\\w)(\\1\\2){2,}");
Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group());
//prints "TATATATA"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.