简体   繁体   English

Perl-后面跟随捕获组时,反向引用不可用?

[英]Perl - backreference not available when capture group is followed by?

When a capture group is followed by a question mark, the backreference appears to be unavailable 当捕获组后跟一个问号时,反向引用似乎不可用

my $test = "this is a very long day indeed";

if ($test =~ m/^this.+(very).+(indeed)?/) {
  print "It matched the regex.\n";
  print "$1 :: $2\n";
}

This prints 此打印

It matched the first test.
very :: 

Is this normal behaviour? 这是正常行为吗? I can't find mention of it in any documentation. 我找不到任何文档中提到它。 I'm trying to match lines in a log file where the second capture group may or may not exist. 我正在尝试匹配第二个捕获组可能存在或不存在的日志文件中的行。

It's not a backreference problem. 这不是反向引用问题。 Characters from your last group are matched by .+ but not by your optional capturing group, thus this last group matches an empty string. 最后一组中的字符由.+匹配,但与可选捕获组中的字符不匹配,因此,最后一组与空字符串匹配。

The problem is that you use a greedy quantifier that matches all possible characters before. 问题是您使用的贪婪量词之前匹配所有可能的字符。 Since your last group is optional, .+ matches all until the end of the line, the regex engine doesn't need to backtrack to match your string (and doesn't need to find "indeed"). 由于您的最后一个组是可选的,所以.+匹配所有行,直到行尾为止,正则表达式引擎不需要回溯以匹配您的字符串(也不需要找到“确实”)。

A simple way to solve the problem is to use a lazy quantifier instead and an end anchor to force to go to the end of the line (because a lazy quantifier stops as soon as possible): 解决问题的一种简单方法是改为使用惰性量词,并使用末端锚点强制移至行尾(因为惰性量词尽快停止):

m/^this.+(very).+?(indeed)?$/

note: if "indeed" aren't always the last characters of the string, you must add .* before the $ 注意:如果“ indeed”并不总是字符串的最后一个字符,则必须在$之前添加.*

This is an additional note about greedyness, which was your problem (that got answered by Casimir). 这是关于贪婪的附加说明,这是您的问题(卡西米尔(Casimir)回答了)。

Realize that regex engines by default, will consume all it can until it finds what satisfies the sub-expression to the right of the current evaluation sub-expression. 意识到默认情况下,正则表达式引擎将消耗所有力所能及的,直到找到满足当前评估子表达式右边的子表达式为止。

Any time you think to use a .+ greedy quantifier with a DOT metachar should raise a red flag to think twice. 任何时候您想对DOT元字符使用.+贪婪量词,都应该发出一个危险信号,以三思而后行。 It will blow right past what you possibly intend to mach if it can. 如果可能的话,它将超越您可能打算进行的处理。

For this reason, try to replace this with something more specific that doesn't have a chance to go past your intended target. 出于这个原因,请尝试用更具体的内容替换它,使其没有机会超出您的预期目标。

Modifying your sample regex slightly shows how this could happen. 稍微修改示例正则表达式将显示这种情况如何发生。

 my $test = "this is a very long day indeed, very long.";

 if ($test =~ m/

      ^
      ( this )               # (1)
      ( .+ )                 # (2)
      ( very )               # (3)
      ( .+ )                 # (4)
      ( indeed )?            # (5)

 /x) {
   print "All  = '$&'\n";
   print "grp1 = '$1'\n";
   print "grp1 = '$2'\n";
   print "grp1 = '$3'\n";
   print "grp1 = '$4'\n";
 }

 # Output >>
 # 
 # All  = 'this is a very long day indeed, very long.'
 # grp1 = 'this'
 # grp1 = ' is a very long day indeed, '
 # grp1 = 'very'
 # grp1 = ' long.'
 # 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM