如何在 Perl 中找到正则表达式匹配的位置？

Question

I need to write a function that receives a string and a regex.我需要编写一个接收字符串和正则表达式的函数。 I need to check if there is a match and return the start and end location of a match.我需要检查是否有匹配项并返回匹配项的开始和结束位置。 (The regex was already compiled by qr// .) （正则表达式已经由qr//编译。）

The function might also receive a "global" flag and then I need to return the (start,end) pairs of all the matches.该函数可能还会收到一个“全局”标志，然后我需要返回所有匹配项的 (start,end) 对。

I cannot change the regex, not even add () around it as the user might use () and \\1 .我无法更改正则表达式，甚至不能在它周围添加() ，因为用户可能会使用()和\\1 。 Maybe I can use (?:) .也许我可以使用(?:) 。

Example: given "ababab" and the regex qr/ab/ , in the global case I need to get back 3 pairs of (start, end).示例：给定 "ababab" 和正则表达式qr/ab/ ，在全局情况下，我需要取回 3 对 (start, end)。

Answer 1

The built-in variables @- and @+ hold the start and end positions, respectively, of the last successful match.内置变量@-和@+保存最后一次成功匹配的开始和结束位置。 $-[0] and $+[0] correspond to entire pattern, while $-[N] and $+[N] correspond to the $N ( $1 , $2 , etc.) submatches. $-[0]和$+[0]对应于整个模式，而$-[N]和$+[N]对应于$N （ $1 ， $2等）子匹配。

Answer 2

Forget my previous post, I've got a better idea.忘记我之前的帖子，我有一个更好的主意。

sub match_positions {
    my ($regex, $string) = @_;
    return if not $string =~ /$regex/;
    return ($-[0], $+[0]);
}
sub match_all_positions {
    my ($regex, $string) = @_;
    my @ret;
    while ($string =~ /$regex/g) {
        push @ret, [ $-[0], $+[0] ];
    }
    return @ret
}

This technique doesn't change the regex in any way.这种技术不会以任何方式改变正则表达式。

Edited to add: to quote from perlvar on $1..$9.编辑添加：从perlvar引用 $1..$9。 "These variables are all read-only and dynamically scoped to the current BLOCK." “这些变量都是只读的，并且动态范围限定为当前的 BLOCK。” In other words, if you want to use $1..$9, you cannot use a subroutine to do the matching.换句话说，如果要使用$1..$9，就不能使用子程序来进行匹配。

Answer 3

The pos function gives you the position of the match. pos 函数为您提供匹配的位置。 If you put your regex in parentheses you can get the length (and thus the end) using length $1 .如果您将正则表达式放在括号中，您可以使用length $1获得长度（以及结尾）。 Like this像这样

sub match_positions {
    my ($regex, $string) = @_;
    return if not $string =~ /($regex)/;
    return (pos($string) - length $1, pos($string));
}
sub all_match_positions {
    my ($regex, $string) = @_;
    my @ret;
    while ($string =~ /($regex)/g) {
        push @ret, [pos($string) - length $1, pos($string)];
    }
    return @ret
}

Answer 4

#!/usr/bin/perl

# search the postions for the CpGs in human genome

sub match_positions {
    my ($regex, $string) = @_;
    return if not $string =~ /($regex)/;
    return (pos($string), pos($string) + length $1);
}
sub all_match_positions {
    my ($regex, $string) = @_;
    my @ret;
    while ($string =~ /($regex)/g) {
        push @ret, [(pos($string)-length $1),pos($string)-1];
    }
    return @ret
}

my $regex='CG';
my $string="ACGACGCGCGCG";
my $cgap=3;    
my @pos=all_match_positions($regex,$string);

my @hgcg;

foreach my $pos(@pos){
    push @hgcg,@$pos[1];
}

foreach my $i(0..($#hgcg-$cgap+1)){
my $len=$hgcg[$i+$cgap-1]-$hgcg[$i]+2;
print "$len\n"; 
}

Answer 5

You can also use the deprecated $` variable, if you're willing to have all the REs in your program execute slower.如果您愿意让程序中的所有 RE 执行得更慢，您也可以使用已弃用的 $` 变量。 From perlvar:来自 perlvar：

   $‘      The string preceding whatever was matched by the last successful pattern match (not
           counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK).
           (Mnemonic: "`" often precedes a quoted string.)  This variable is read-only.

           The use of this variable anywhere in a program imposes a considerable performance penalty
           on all regular expression matches.  See "BUGS".

如何在 Perl 中找到正则表达式匹配的位置？

问题描述

5 个解决方案

解决方案1
78 2008-09-17 20:54:04

解决方案2
21 2008-09-17 20:47:29

解决方案3
8 2008-09-17 20:38:53

解决方案4
0

解决方案5
0 2008-09-17 20:43:13

如何在 Perl 中找到正则表达式匹配的位置？

问题描述

5 个解决方案

解决方案1 78 2008-09-17 20:54:04

解决方案2 21 2008-09-17 20:47:29

解决方案3 8 2008-09-17 20:38:53

解决方案4 0

解决方案5 0 2008-09-17 20:43:13

解决方案1
78 2008-09-17 20:54:04

解决方案2
21 2008-09-17 20:47:29

解决方案3
8 2008-09-17 20:38:53

解决方案4
0

解决方案5
0 2008-09-17 20:43:13