Is there a way to reuse a consumed character of the source in pattern matching?
For example, suppose I want to find a pattern with regex expression (a+b+|b+a+)
ie more than one a followed by more than one b OR vice versa.
Suppose the input is aaaabbbaaaaab
Then the output using regex would be aaaabbb
and aaaaab
How can I get the output to be
aaaabbb
bbbaaaaa
aaaaab
Try this way
String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
System.out.println(m.group(1));
This regex uses look around mechanisms and will find (a+b+|b+a+)
that
^
of the input b
that is predicted by a
a
that is predicted by b
. Output:
aaaabbb
bbbaaaaa
aaaaab
Is
^
essentially needed in this regular expression?
Yes, without ^
this regex wouldn't capture aaaabbb
placed at start of input.
If I wouldn't add (^|(?<=a)b|(?<=b)a)
after (?=(a+b+|b+a+))
this regex would match
aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab
so I needed to limit this results to only these that starts with a
that has b
before it (but not include b
in match - so look behind was perfect for that) and b
that is predicted by a
.
But lets not forget about a
or b
that are placed at start of the string and are not predicted by anything. To include them we can use ^
.
Maybe it will be easier to show this idea with this regex
(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a)
.
(?<=^|a)b
will match b
that is placed at start of string, or has a
before it (?<=^|b)a
will match a
that is placed at start of string, or has b
before it You can simulate this with lookbehind:
((?<=a)b+|(?<=b)a+)
This outputs
bbb aaaaa b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.