简体   繁体   English

Perl的`(?PARNO)`在完成时会丢弃它自己的命名捕获吗?

[英]Does Perl's `(?PARNO)` discard its own named captures when it's done?

Do recursive regexes understand named captures? 递归正则表达式是否理解命名捕获? There is a note in the docs for (?{{ code }}) that it's an independent subpattern with its own set of captures that are discarded when the subpattern is done, and there's a note in (?PARNO) that its "similar to (?{{ code }}) . Is (?PARNO) discarding its own named captures when it's done? (?{{ code }})的文档中有一个注释,它是一个独立的子模式,有自己的一组捕获,在子模式完成时被丢弃,并且(?PARNO)中有一个注释,它“类似于(?{{ code }})(?PARNO)丢弃自己的命名捕获?

I'm writing about Perl's recursive regular expressions for Mastering Perl . 我正在写关于Perl的Mastering Perl的递归正则表达式。 perlre already has an example with balanced parens (I show it in Matching balanced parenthesis in Perl regex ), so I thought I'd try balanced quote marks: perlre已经有一个平衡parens的例子(我在Perl正则表达式匹配平衡括号中显示它),所以我想我会尝试平衡引号:

#!/usr/bin/perl
# quotes-nested.pl

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched!" if m/
    (
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    ( (?1) ) 
                )* 
            )
        ['"]
    )
    /xg;

print "
1 => $1
2 => $2
3 => $3
4 => $4
5 => $5
";

This works and the two quotes show up in $1 and $3 : 这有效,两个报价显示在$1 $3$3

Matched!
1 => 'Amelia said "I am a camel"'
2 => Amelia said "I am a camel"
3 => "I am a camel"
4 => 
5 => 

That's fine. 没关系。 I understand that. 我明白那个。 However, I don't want to know the numbers. 但是,我不想知道这些数字。 So, I make the first capture group a named capture and look in %- expecting to see the two substrings I previously saw in $1 and $2 : 所以,我让第一个捕获组成为命名捕获并查看%-期望看到我之前在$1$2看到的两个子串:

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched [$+{said}]!" if m/
    (?<said>
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    (?1) 
                )* 
            )
        ['"]
    )
    /xg;

use Data::Dumper;
print Dumper( \%- );

I only see the first: 我只看到第一个:

Matched ['Amelia said "I am a camel"']!
$VAR1 = {
          'said' => [
                      '\'Amelia said "I am a camel"\''
                    ]
        };

I expected that (?1) would repeat everything in the first capture group, including the named capture to said . 我预计(?1)将重复第一个捕获组中的一切,包括一个名为捕捉said I can fix that a bit by naming a new capture: 我可以通过命名一个新的捕获来解决这个问题:

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched [$+{said}]!" if m/
    (?<said>
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    (?<said> (?1) ) 
                )* 
            )
        ['"]
    )
    /xg;

use Data::Dumper;
print Dumper( \%- );

Now I get what I expected: 现在我得到了我的期望:

Matched ['Amelia said "I am a camel"']!
$VAR1 = {
          'said' => [
                      '\'Amelia said "I am a camel"\'',
                      '"I am a camel"'
                    ]
        };

I thought that I could fix this by moving the named capture up one level: 我认为我可以通过将命名捕获移动到一个级别来解决这个问题:

use v5.10;

$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE

say "Matched [$+{said}]!" if m/
    (
        (?<said>
        ['"]
            ( 
                (?: 
                    [^'"]+
                    | 
                    (?1)
                )* 
            )
        ['"]
        )
    )
    /xg;

use Data::Dumper;
print Dumper( \%- );

But, this doesn't catch the smaller substring in said either: 但是,这并不赶上小串在said两种:

Matched ['Amelia said "I am a camel"']!
$VAR1 = {
          'said' => [
                      '\'Amelia said "I am a camel"\''
                    ]
        };

I think I understand this, but I also know that there are people here who actually touch the C code that makes it happen. 我想我理解这一点,但我也知道这里有人真正触及C代码才能实现。 :) :)

And, as I write this, I think I should overload the STORE tie for %- to find out, but then I'd have to find out how to do that. 而且,当我写这篇文章时,我认为我应该重载STORE领带为%-找出来,但后来我必须找出如何做到这一点。

After playing around with this, I'm satisfied that what I said in the question is right. 在玩完这个之后,我很满意我在问题中所说的是对的。 Each call to (?PARNO) gets a complete and separate set of the match variables that it discards at the end of its run. 每次调用(?PARNO)都会得到一个完整且独立的匹配变量集,它在运行结束时会丢弃。

You can get all the things that matched in each sub pattern by using an array external to the pattern match operator and pushing onto it at the end of the repeated sub pattern, like in this example: 您可以通过使用模式匹配运算符外部的数组并在重复子模式的末尾按下它来获取每个子模式中匹配的所有内容,如下例所示:

#!/usr/bin/perl
# nested_carat_n.pl

use v5.10;

$_ =<<'HERE';
Outside "Top Level 'Middle Level "Bottom Level" Middle' Outside"
HERE

my @matches;

say "Matched!" if m/
    (?(DEFINE)
        (?<QUOTE_MARK> ['"])
        (?<NOT_QUOTE_MARK> [^'"])
    )
    (
    (?<quote>(?&QUOTE_MARK))
        (?:
            (?&NOT_QUOTE_MARK)++
            |
            (?R)
        )*
    \g{quote}
    )
    (?{ push @matches, $^N })
    /x;

say join "\n", @matches;

I go through it in depth in Chapter 2 of Mastering Perl , which you can read for free (at least for awhile). 我将在掌握Perl的第2章深入介绍它,你可以免费阅读(至少有一段时间)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM