简体   繁体   中英

Perl regex and capturing groups

The following prints ac | a | bbb | c ac | a | bbb | c

    #!/usr/bin/env perl
    use strict;
    use warnings;
    # use re 'debug';
    
    my $str = 'aacbbbcac';
    
    if ($str =~ m/((a+)?(b+)?(c))*/) {
       print "$1 | $2 | $3 | $4\n";
    }

It seems like failed matches do not reset the captured group variables. What am I missing?

it seems like failed matches dont reset the captured group variables

There is no failed matches in there. Your regex matches the string fine. Although there are some failed matches for inner groups in some repetition. Each matched group might be overwritten by the next match found for that particular group, or keep it's value from previous match, if that group is not matched in current repetition.

Let's see how regex match proceeds:

  • First (a+)?(b+)?(c) matches aac . Since (b+)? is optional, that will not be matched. At this stage, each capture group contains following part:

    • $1 contains entire match - aac
    • $2 contains (a+)? part - aa
    • $3 contains (b+)? part - null .
    • $4 contains (c) part - c
  • Since there is still some string left to match - bbbcac . Proceeding further - (a+)?(b+)?(c) matches - bbbc . Since (a+)? is optional, that won't be matched.

    • $1 contains entire match - bbbc . Overwrites the previous value in $1
    • $2 doesn't match. So, it will contain text previously matched - aa
    • $3 this time matches. It contains - bbb
    • $4 matches c
  • Again, (a+)?(b+)?(c) will go on to match the last part - ac .

    • $1 contains entire match - ac .
    • $2 matches a this time. Overwrites the previous value in $2 . It now contains - a
    • $3 doesn't matches this time, as there is no (b+)? part. It will be same as previous match - bbb
    • $4 matches c . Overwrites the value from previous match. It now contains - c .

Now, there is nothing left in the string to match. The final value of all the capture groups are:

  • $1 - ac
  • $2 - a
  • $3 - bbb
  • $4 - c .

As odd as it seems this is the "expected" behavior. Here's a quote from the perlre docs:

NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.

For the parenthesis grouping, /(\\d+)/ This documentation says to use \\1 \\2 ... or \\g{1} \\g{2} . Using $1 or $2... in a substitution regex part will cause an error like: scalar found in pattern

# Example to turn a css href to local css.
# Transforms <link href="http://..." into <link href="css/..."

# ... inside a loop ...

my $localcss = $_; # one line from the file
$localcss =~ s/href.+\/([^\/]+\.css")/href="css\/\1/g ;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM