简体   繁体   中英

Java regex : How to reuse a consumed character in pattern matching?

Is there a way to reuse a consumed character of the source in pattern matching?

For example, suppose I want to find a pattern with regex expression (a+b+|b+a+) ie more than one a followed by more than one b OR vice versa.

Suppose the input is aaaabbbaaaaab

Then the output using regex would be aaaabbb and aaaaab

How can I get the output to be

aaaabbb
bbbaaaaa
aaaaab

Try this way

String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
    System.out.println(m.group(1));

This regex uses look around mechanisms and will find (a+b+|b+a+) that

  • exists at start ^ of the input
  • starts with b that is predicted by a
  • starts with a that is predicted by b .

Output:

aaaabbb
bbbaaaaa
aaaaab

Is ^ essentially needed in this regular expression?

Yes, without ^ this regex wouldn't capture aaaabbb placed at start of input.

If I wouldn't add (^|(?<=a)b|(?<=b)a) after (?=(a+b+|b+a+)) this regex would match

aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab

so I needed to limit this results to only these that starts with a that has b before it (but not include b in match - so look behind was perfect for that) and b that is predicted by a .

But lets not forget about a or b that are placed at start of the string and are not predicted by anything. To include them we can use ^ .


Maybe it will be easier to show this idea with this regex

(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a) .

  • (?<=^|a)b will match b that is placed at start of string, or has a before it
  • (?<=^|b)a will match a that is placed at start of string, or has b before it

You can simulate this with lookbehind:

((?<=a)b+|(?<=b)a+)

This outputs

bbb aaaaa b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM