Perl Regex多重匹配

Question

我正在尋找一個表現如下的正則表達式：

輸入：“你好世界”。

輸出：he，el，ll，lo，wo或rl，ld

我的想法是有道理的

    while($string =~ m/(([a-zA-Z])([a-zA-Z]))/g) {
        print "$1-$2 ";
    }

但這確實有點不同。

Answer 1

這很棘手。 您必須捕獲它，保存它，然后強制回溯。

你可以這樣做：

use v5.10;   # first release with backtracking control verbs

my $string = "hello, world!";
my @saved;

my $pat = qr{
    ( \pL {2} )
    (?{ push @saved, $^N })
    (*FAIL)
}x;

@saved = ();
$string =~ $pat;
my $count = @saved;
printf "Found %d matches: %s.\n", $count, join(", " => @saved);

產生這個：

Found 8 matches: he, el, ll, lo, wo, or, rl, ld.

如果您沒有v5.10，或者您頭疼，可以使用：

my $string = "hello, world!";
my @pairs = $string =~ m{
  # we can only match at positions where the
  # following sneak-ahead assertion is true:
    (?=                 # zero-width look ahead
        (               # begin stealth capture
            \pL {2}     #       save off two letters
        )               # end stealth capture
    )
  # succeed after matching nothing, force reset
}xg;

my $count = @pairs;
printf "Found %d matches: %s.\n", $count, join(", " => @pairs);

這產生與以前相同的輸出。

但是你可能仍然會頭疼。

Answer 2

不需要“強制回溯”！

push @pairs, "$1$2" while /([a-zA-Z])(?=([a-zA-Z]))/g;

雖然您可能希望匹配任何字母而不是您指定的有限集。

push @pairs, "$1$2" while /(\pL)(?=(\pL))/g;

Answer 3

還有另一種方法。 不使用任何正則表達式魔法，它確實使用嵌套map但如果需要，這可以很容易地轉換為for循環。

#!/usr/bin/env perl

use strict;
use warnings;

my $in = "hello world.";
my @words = $in =~ /(\b\pL+\b)/g;

my @out = map {
  my @chars = split '';
  map { $chars[$_] . $chars[$_+1] } ( 0 .. $#chars - 1 );
} @words;

print join ',', @out;
print "\n";

同樣，對我來說，這比一個奇怪的正則表達式YMMV更具可讀性。

Answer 4

我會在預測中使用捕獲的group ...

(?=([a-zA-Z]{2}))
    ------------
         |->group 1 captures two English letters

試試這里

Answer 5

您可以通過查找字母並使用pos函數來使用捕獲的位置， \\G在另一個正則表達式中引用它，並使用substr從字符串中讀取幾個字符。

use v5.10;
use strict;
use warnings;

my $letter_re = qr/[a-zA-Z]/;

my $string = "hello world.";
while( $string =~ m{ ($letter_re) }gx ) {
    # Skip it if the next character isn't a letter
    # \G will match where the last m//g left off.
    # It's pos() in a regex.
    next unless $string =~ /\G $letter_re /x;

    # pos() is still where the last m//g left off.
    # Use substr to print the character before it (the one we matched)
    # and the next one, which we know to be a letter.
    say substr $string, pos($string)-1, 2;
}

您可以使用零寬度正斷言(?=pattern)將“檢查下一個字母”邏輯放在原始正則表達式中。 零寬度意味着它沒有被捕獲並且不會提升m//g正則表達式的位置。 這有點緊湊，但零寬度斷言變得棘手。

while( $string =~ m{ ($letter_re) (?=$letter_re) }gx ) {
    # pos() is still where the last m//g left off.
    # Use substr to print the character before it (the one we matched)
    # and the next one, which we know to be a letter.
    say substr $string, pos($string)-1, 2;
}

更新：我最初嘗試捕獲匹配和m{ ($letter_re (?=$letter_re)) }gx為m{ ($letter_re (?=$letter_re)) }gx但這不起作用。 向前看是零寬度並且滑出比賽。 其他人的答案顯示，如果你在預測中放入第二個捕獲，那么它可以崩潰到......

say "$1$2" while $string =~ m{ ($letter_re) (?=($letter_re)) }gx;

我在這里留下TMTOWTDI的所有答案，特別是如果你不是一個正則表達式的主人。

Perl Regex多重匹配

問題描述

5 個解決方案

解決方案1
10 已采納 2013-03-07 18:54:09

解決方案2
5 2013-03-07 19:13:26

解決方案3
1 2013-03-07 19:43:46

解決方案4
0 2013-03-07 18:58:27

解決方案5
0 2013-03-07 19:23:06

Perl Regex多重匹配

問題描述

5 個解決方案

解決方案1 10 已采納 2013-03-07 18:54:09

解決方案2 5 2013-03-07 19:13:26

解決方案3 1 2013-03-07 19:43:46

解決方案4 0 2013-03-07 18:58:27

解決方案5 0 2013-03-07 19:23:06

解決方案1
10 已采納 2013-03-07 18:54:09

解決方案2
5 2013-03-07 19:13:26

解決方案3
1 2013-03-07 19:43:46

解決方案4
0 2013-03-07 18:58:27

解決方案5
0 2013-03-07 19:23:06