简体   繁体   English

奇数 Perl 正则表达式行为与 Parens

[英]Odd Perl Regex Behavior with Parens

I'm pulling in some Wikipedia markup and I'm wanting to match the URLs in relative (on Wikipedia) links.我正在提取一些 Wikipedia 标记,并且希望匹配相对(在 Wikipedia 上)链接中的 URL。 I don't want to match any URL containing a colon (not counting the protocol colon), to avoid special pages and the like, so I have the following code:我不想匹配任何包含冒号(不包括协议冒号)的 URL ,以避免特殊页面等,所以我有以下代码:

while ($body =~ m|<a href="(?<url>/wiki/[^:"]+)|gis) { 
  my $url = $+{url};
  print "$url\n";
}

unfortunately, this code is not working quite as expected.不幸的是,这段代码没有像预期的那样工作。 Any URL that contains a parenthetical [ie /wiki/Eon_(geology )] is getting truncated prematurely just before the opening paren, so that URL would match as /wiki/Eon_ .任何包含括号 [即/wiki/Eon_(geology )] 的 URL 都会在开始括号之前被提前截断,因此 URL 将匹配为/wiki/Eon_ I've been looking at the code for a bit and I cannot figure out what I'm doing wrong.我一直在查看代码,但我无法弄清楚我做错了什么。 Can anyone provide some insight?谁能提供一些见解?

There isn't anything wrong in this code as it stands, so long as your Perl is new enough to support these RE features.只要您的 Perl 足够新以支持这些 RE 功能,这段代码就没有任何问题。 Tested with Perl 5.10.1.使用 Perl 5.10.1 测试。

$body = <<"__ENDHTML__";
<a href="/wiki/Eon_(geology)">Body</a> Blah blah 
<a href="/wiki/Some_other_(parenthesis)">Body</a>
__ENDHTML__

while ($body =~ m|<a href="(?<url>/wiki/[^:"]+)|gis) { 
  my $url = $+{url};
  print "$url\n";
}

Are you using an old Perl?您使用的是旧的 Perl 吗?

You didn't anchor the RE to the end of the string.您没有将 RE 锚定到字符串的末尾。 Put a " afterwards.后面加一个“。

While that is a problem, it isn't the problem he was trying to solve.虽然这是一个问题,但这不是他试图解决的问题。 The problem he was trying to solve was that there was nothing to match the method/hostname (http://en.wiki...) in the RE.他试图解决的问题是 RE 中的方法/主机名 (http://en.wiki...) 没有任何匹配项。 Adding a.*?添加一个。*? would help that, before the "(?"在“(?”之前)会有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM