简体   繁体   中英

Perl greedy regex is not acting greedy

Giving the following code:

use strict;
use warnings;

my $text = "asdf(blablabla)";

$text =~ s/(.*?)\((.*)\)/$2/;
print "\nfirst match: $1";
print "\nsecond match: $2";

I expected that $2 would catch my last bracket, yet my output is:
在此处输入图片说明
If .* by default it's greedy why it stopped at the bracket?

The .* is a greedy subpattern, but it does not account for grouping. Grouping is defined with a pair of unescaped parentheses (see Use Parentheses for Grouping and Capturing ).

See where your group boundaries are:

s/(.*?)\((.*)\)/$2/
  | G1|  |G2| 

So, the \\( and \\) matching ( and ) are outside the groups , and will not be part of neither $1 nor $2 .

If you need the ) be part of $2 , use

s/(.*?)\((.*\))/$2/
              ^

A regex engine is processing both the string and the pattern from left to right. The first (.*?) is handled first, and it matches up to the first literal ( symbol as it is lazy (matches as few chars as possible before it can return a valid match), and the whole part before the ( is placed into Group 1 stack. Then, the ( is matched, but not captured, then (.*) matches any 0+ characters other than a newline up to the last ) symbol, and places the capture into Group 2. Then, the ) is just matched. The point is that .* grabs the whole string up to the end, but then backtracking happens since the engine tries to accommodate for the final ) in the pattern. The ) must be matched, but not captured in your pattern, thus, it is not part of Group 2 due to the group boundary placement. You can see the regex debugger at this regex demo page to see how the pattern matches your string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM