简体   繁体   中英

How can I find _all_ locations of a regex match in Perl?

I can see from this answer that if I do

sub match_all_positions {
    my ($regex, $string) = @_;
    my @ret;
    while ($string =~ /$regex/g) { push @ret, $-[0] }
    return @ret
}

print join ',', match_all_positions('0{3}', '001100010000');

I get

4,8

What do I need to do to get the indexes of all matches, even when the overlap, such as positions 8 and 9 in the example above?

I can do

sub match_all_positions_b  {
    my ($substr, $string) = @_;
    return unless index($string, $substr) > 0;
    my @res;
    my $i = 0;
    while ($i <= (length($string) - $length)) {
        $i = index($string, $substr, $i);
        last if $i < 0;
        push @res, $i++;
    }
    return @res;
}

print join ',', match_all_positions_b('000', '001100010000');

which just lets me match a substring, or

sub match_all_positions_c {
    my ($substr, $string) = @_;
    my $re = '^' . $substr;
    my @res;
    for (0..(length($string) - $length)) {
         push @res, $_ if substr($string, $_) =~ /$re/;
    }
    return @res;
}

print join ',', match_all_positions_c('0{3}', '001100010000');

Which is twice as slow.

is there a way to get all matches, even when they overlap? Or should I just take the speed loss because it's inherent to using regex matches?

You need to update your regex for zero-width look-ahead matching.

Try calling your function like this:

print join ',', match_all_positions('(?=0{3})', '001100010000');

If you want to find the positions at which it matches:

my @matches;
push @matches, "$-[1]:$+[1]" while "aabbcc" =~ /(?=(a.*c))/sg;

Output:

0:6
1:6

If you want all possible matches,

local our @matches;
"aabbcc" =~ /(a.*?c)(?{ push @matches, "$-[1]:$+[1]" })(?!)/s;

Output:

0:5
0:6
1:5
1:6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM