简体   繁体   English

Java正则表达式:如何在模式匹配中重用消耗的角色?

[英]Java regex : How to reuse a consumed character in pattern matching?

Is there a way to reuse a consumed character of the source in pattern matching? 有没有办法在模式匹配中重用源的消耗字符?

For example, suppose I want to find a pattern with regex expression (a+b+|b+a+) ie more than one a followed by more than one b OR vice versa. 例如,假设我想找到一个带有正则表达式的模式(a+b+|b+a+)即多于一个a后跟多个b或反之亦然。

Suppose the input is aaaabbbaaaaab 假设输入是aaaabbbaaaaab

Then the output using regex would be aaaabbb and aaaaab 那么使用正则表达式的输出将是aaaabbbaaaaab

How can I get the output to be 我怎样才能获得输出

aaaabbb
bbbaaaaa
aaaaab

Try this way 试试这种方式

String data = "aaaabbbaaaaab";
Matcher m = Pattern.compile("(?=(a+b+|b+a+))(^|(?<=a)b|(?<=b)a)").matcher(data);
while(m.find())
    System.out.println(m.group(1));

This regex uses look around mechanisms and will find (a+b+|b+a+) that 这个正则表达式使用环视机制,并找到(a+b+|b+a+)

  • exists at start ^ of the input 存在于输入的开始^
  • starts with b that is predicted by a 开头b由预测a
  • starts with a that is predicted by b . 开始与a由预测b

Output: 输出:

aaaabbb
bbbaaaaa
aaaaab

Is ^ essentially needed in this regular expression? 在这个正则表达式中基本上需要^吗?

Yes, without ^ this regex wouldn't capture aaaabbb placed at start of input. 是的,没有^这个正则表达式不会捕获在输入开始时放置的aaaabbb

If I wouldn't add (^|(?<=a)b|(?<=b)a) after (?=(a+b+|b+a+)) this regex would match 如果我不在(?=(a+b+|b+a+))之后添加(^|(?<=a)b|(?<=b)a) (?=(a+b+|b+a+))这个正则表达式匹配

aaaabbb
aaabbb
aabbb
abbb
bbbaaaaa
bbaaaaa
baaaaa
aaaaab
aaaab
aaab
aab
ab

so I needed to limit this results to only these that starts with a that has b before it (but not include b in match - so look behind was perfect for that) and b that is predicted by a . 所以我需要限制这种结果只有这些是开头ab之前(但不包括b在比赛-所以看背后是非常适合)和b由预测a

But lets not forget about a or b that are placed at start of the string and are not predicted by anything. 但是,不要忘记放置在字符串开头的ab ,并且不会被任何东西预测。 To include them we can use ^ . 要包含它们,我们可以使用^


Maybe it will be easier to show this idea with this regex 也许用这个正则表达式来展示这个想法会更容易

(?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a) . (?=(a+b+|b+a+))((?<=^|a)b|(?<=^|b)a)

  • (?<=^|a)b will match b that is placed at start of string, or has a before it (?<=^|a)b将匹配b是放置在串的开始,或具有a
  • (?<=^|b)a will match a that is placed at start of string, or has b before it (?<=^|b)a将匹配a被放置在串的开始,或具有b之前它

You can simulate this with lookbehind: 您可以使用lookbehind来模拟这个:

((?<=a)b+|(?<=b)a+)

This outputs 这输出

bbb aaaaa b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM