[英]Java regex : How to reuse a consumed character in pattern matching?
Is there a way to reuse a consumed character of the source in pattern matching? 有没有办法在模式匹配中重用源的消耗字符?
For example, suppose I want to find a pattern with regex expression (a+b+|b+a+)
ie more than one a followed by more than one b OR vice versa. 例如,假设我想找到一个带有正则表达式的模式(a+b+|b+a+)
即多于一个a后跟多个b或反之亦然。
Suppose the input is aaaabbbaaaaab
假设输入是aaaabbbaaaaab
Then the output using regex would be aaaabbb
and aaaaab
那么使用正则表达式的输出将是aaaabbb
和aaaaab
How can I get the output to be 我怎样才能获得输出
aaaabbb
bbbaaaaa
aaaaab
Try this way 试试这种方式
String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
System.out.println(m.group(1));
This regex uses look around mechanisms and will find (a+b+|b+a+)
that 这个正则表达式使用环视机制,并找到(a+b+|b+a+)
^
of the input 存在于输入的开始^
b
that is predicted by a
开头b
由预测a
a
that is predicted by b
. 开始与a
由预测b
。 Output: 输出:
aaaabbb
bbbaaaaa
aaaaab
Is
^
essentially needed in this regular expression? 在这个正则表达式中基本上需要^
吗?
Yes, without ^
this regex wouldn't capture aaaabbb
placed at start of input. 是的,没有^
这个正则表达式不会捕获在输入开始时放置的aaaabbb
。
If I wouldn't add (^|(?<=a)b|(?<=b)a)
after (?=(a+b+|b+a+))
this regex would match 如果我不在(?=(a+b+|b+a+))
之后添加(^|(?<=a)b|(?<=b)a)
(?=(a+b+|b+a+))
这个正则表达式匹配
aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab
so I needed to limit this results to only these that starts with a
that has b
before it (but not include b
in match - so look behind was perfect for that) and b
that is predicted by a
. 所以我需要限制这种结果只有这些是开头a
有b
之前(但不包括b
在比赛-所以看背后是非常适合)和b
由预测a
。
But lets not forget about a
or b
that are placed at start of the string and are not predicted by anything. 但是,不要忘记放置在字符串开头的a
或b
,并且不会被任何东西预测。 To include them we can use ^
. 要包含它们,我们可以使用^
。
Maybe it will be easier to show this idea with this regex 也许用这个正则表达式来展示这个想法会更容易
(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a)
. (?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a)
。
(?<=^|a)b
will match b
that is placed at start of string, or has a
before it (?<=^|a)b
将匹配b
是放置在串的开始,或具有a
前 (?<=^|b)a
will match a
that is placed at start of string, or has b
before it (?<=^|b)a
将匹配a
被放置在串的开始,或具有b
之前它 You can simulate this with lookbehind: 您可以使用lookbehind来模拟这个:
((?<=a)b+|(?<=b)a+)
This outputs 这输出
bbb aaaaa b
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.