简体   繁体   English

Perl Regex多重匹配

[英]Perl Regex Multiple Matches

I'm looking for a regular expression that will behave as follows: 我正在寻找一个表现如下的正则表达式:

input: "hello world." 输入:“你好世界”。

output: he, el, ll, lo, wo, or, rl, ld 输出:he,el,ll,lo,wo或rl,ld

my idea was something along the lines of 我的想法是有道理的

    while($string =~ m/(([a-zA-Z])([a-zA-Z]))/g) {
        print "$1-$2 ";
    }

But that does something a little bit different. 但这确实有点不同。

It's tricky. 这很棘手。 You have to capture it, save it, and then force a backtrack. 您必须捕获它,保存它,然后强制回溯。

You can do that this way: 你可以这样做:

use v5.10;   # first release with backtracking control verbs

my $string = "hello, world!";
my @saved;

my $pat = qr{
    ( \pL {2} )
    (?{ push @saved, $^N })
    (*FAIL)
}x;

@saved = ();
$string =~ $pat;
my $count = @saved;
printf "Found %d matches: %s.\n", $count, join(", " => @saved);

produces this: 产生这个:

Found 8 matches: he, el, ll, lo, wo, or, rl, ld.

If you do not have v5.10, or you have a headache, you can use this: 如果您没有v5.10,或者您头疼,可以使用:

my $string = "hello, world!";
my @pairs = $string =~ m{
  # we can only match at positions where the
  # following sneak-ahead assertion is true:
    (?=                 # zero-width look ahead
        (               # begin stealth capture
            \pL {2}     #       save off two letters
        )               # end stealth capture
    )
  # succeed after matching nothing, force reset
}xg;

my $count = @pairs;
printf "Found %d matches: %s.\n", $count, join(", " => @pairs);

That produces the same output as before. 这产生与以前相同的输出。

But you might still have a headache. 但是你可能仍然会头疼。

No need "to force backtracking"! 不需要“强制回溯”!

push @pairs, "$1$2" while /([a-zA-Z])(?=([a-zA-Z]))/g;

Though you might want to match any letter rather than the limited set you specified. 虽然您可能希望匹配任何字母而不是您指定的有限集。

push @pairs, "$1$2" while /(\pL)(?=(\pL))/g;

Yet another way to do it. 还有另一种方法。 Doesn't use any regexp magic, it does use nested map s but this could easily be translated to for loops if desired. 不使用任何正则表达式魔法,它确实使用嵌套map但如果需要,这可以很容易地转换为for循环。

#!/usr/bin/env perl

use strict;
use warnings;

my $in = "hello world.";
my @words = $in =~ /(\b\pL+\b)/g;

my @out = map {
  my @chars = split '';
  map { $chars[$_] . $chars[$_+1] } ( 0 .. $#chars - 1 );
} @words;

print join ',', @out;
print "\n";

Again, for me this is more readable than a strange regex, YMMV. 同样,对我来说,这比一个奇怪的正则表达式YMMV更具可读性。

I would use captured group in lookahead.. 我会在预测中使用捕获的group ...

(?=([a-zA-Z]{2}))
    ------------
         |->group 1 captures two English letters 

try it here 试试这里

You can do this by looking for letters and using the pos function to make use of the position of the capture, \\G to reference it in another regex, and substr to read a few characters from the string. 您可以通过查找字母并使用pos函数来使用捕获的位置, \\G在另一个正则表达式中引用它,并使用substr从字符串中读取几个字符。

use v5.10;
use strict;
use warnings;

my $letter_re = qr/[a-zA-Z]/;

my $string = "hello world.";
while( $string =~ m{ ($letter_re) }gx ) {
    # Skip it if the next character isn't a letter
    # \G will match where the last m//g left off.
    # It's pos() in a regex.
    next unless $string =~ /\G $letter_re /x;

    # pos() is still where the last m//g left off.
    # Use substr to print the character before it (the one we matched)
    # and the next one, which we know to be a letter.
    say substr $string, pos($string)-1, 2;
}

You can put the "check the next letter" logic inside the original regex with a zero-width positive assertion, (?=pattern) . 您可以使用零宽度正断言(?=pattern)将“检查下一个字母”逻辑放在原始正则表达式中。 Zero-width meaning it is not captured and does not advance the position of a m//g regex. 零宽度意味着它没有被捕获并且不会提升m//g正则表达式的位置。 This is a bit more compact, but zero-width assertions get can get tricky. 这有点紧凑,但零宽度断言变得棘手。

while( $string =~ m{ ($letter_re) (?=$letter_re) }gx ) {
    # pos() is still where the last m//g left off.
    # Use substr to print the character before it (the one we matched)
    # and the next one, which we know to be a letter.
    say substr $string, pos($string)-1, 2;
}

UPDATE : I'd originally tried to capture both the match and the look ahead as m{ ($letter_re (?=$letter_re)) }gx but that didn't work. 更新 :我最初尝试捕获匹配和m{ ($letter_re (?=$letter_re)) }gxm{ ($letter_re (?=$letter_re)) }gx但这不起作用。 The look ahead is zero-width and slips out of the match. 向前看是零宽度并且滑出比赛。 Other's answers showed that if you put a second capture inside the look-ahead then it can collapse to just... 其他人的答案显示,如果你在预测中放入第二个捕获,那么它可以崩溃到......

say "$1$2" while $string =~ m{ ($letter_re) (?=($letter_re)) }gx;

I leave all the answers here for TMTOWTDI, especially if you're not a regex master. 我在这里留下TMTOWTDI的所有答案,特别是如果你不是一个正则表达式的主人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM