简体   繁体   English

正则表达式替换重复的字符串模式

[英]Regex to replace a repeating string pattern

I need to replace a repeated pattern within a word with each basic construct unit. 我需要用每个基本构造单元替换单词中的重复模式。 For example I have the string "TATATATA" and I want to replace it with "TA". 例如,我有字符串“TATATATA”,我想用“TA”替换它。 Also I would probably replace more than 2 repetitions to avoid replacing normal words. 此外,我可能会更换超过2次重复,以避免替换正常的单词。

I am trying to do it in Java with replaceAll method. 我试图用Java替换所有方法。

I think you want this (works for any length of the repeated string): 我想你想要这个(适用于任何长度的重复字符串):

String result = source.replaceAll("(.+)\\1+", "$1")

Or alternatively, to prioritize shorter matches: 或者,优先考虑较短的匹配:

String result = source.replaceAll("(.+?)\\1+", "$1")

It matches first a group of letters, and then it again (using back-reference within the match pattern itself). 它首先匹配一组字母,然后再次匹配(在匹配模式本身中使用反向引用)。 I tried it and it seems to do the trick. 我尝试了它,它似乎做了伎俩。


Example

String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";

System.out.println(source.replaceAll("(.+?)\\1+", "$1"));

// HEY dude what's up? Trolo ye .0

You had better use a Pattern here than .replaceAll() . 你最好在这里使用一个Pattern不是.replaceAll() For instance: 例如:

private static final Pattern PATTERN 
    = Pattern.compile("\\b([A-Z]{2,}?)\\1+\\b");

//...

final Matcher m = PATTERN.matcher(input);
ret = m.replaceAll("$1");

edit: example: 编辑:示例:

public static void main(final String... args)
{
    System.out.println("TATATA GHRGHRGHRGHR"
        .replaceAll("\\b([A-Za-z]{2,}?)\\1+\\b", "$1"));
}

This prints: 这打印:

TA GHR

Since you asked for a regex solution: 既然你要求一个正则表达式解决方案:

(\\w)(\\w)(\\1\\2){2,};

(\\w)(\\w) : matches every pair of consecutive word characters ( (.)(.) will catch every consecutive pair of characters of any type), storing them in capturing groups 1 and 2. (\\\\1\\\\2) matches anytime the characters in those groups are repeated again immediately afterward, and {2,} matches when it repeats two or more times ( {2,10} would match when it repeats more than one but less than ten times). (\\w)(\\w) :匹配每对连续的单词字符( (.)(.)将捕获任何类型的每个连续字符对),将它们存储在捕获组 1和2中。 (\\\\1\\\\2)随后立即再次重复这些组中的字符匹配,并且当它重复两次或更多次时{2,}匹配(当重复多于一次但少于十次时{2,10}匹配)。

String s = "hello TATATATA world";    
Pattern p = Pattern.compile("(\\w)(\\w)(\\1\\2){2,}");
Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group());
    //prints "TATATATA"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM