简体   繁体   English

如何在 Perl 中匹配多个正则表达式?

[英]How can I match against multiple regexes in Perl?

I would like to check whether some string match any of a given set of regexes.我想检查某个字符串是否与给定的一组正则表达式中的任何一个匹配。 How can I do that?我该怎么做?

Use smart matching if you have perl version 5.10 or newer!如果您有 perl 5.10 或更高版本,请使用智能匹配!

#! /usr/bin/env perl

use warnings;
use strict;

use feature 'switch';

my @patterns = (
  qr/foo/,
  qr/bar/,
  qr/baz/,
);

for (qw/ blurfl bar quux foo baz /) {
  no warnings 'experimental::smartmatch';
  print "$_: ";
  given ($_) {
    when (@patterns) {
      print "hit!\n";
    }
    default {
      print "miss.\n";
    }
  }
}

Although you don't see an explicit ~~ operator, Perl's given / when does it behind the scenes:虽然你没有看到一个明确的~~操作符,但Perl在幕后given / when

Most of the power comes from the implicit smartmatching that can sometimes apply.大部分功能来自有时可以应用的隐式智能匹配。 Most of the time, when(EXPR) is treated as an implicit smartmatch of $_ , that is, $_ ~~ EXPR .大多数时候, when(EXPR)被视为$_的隐式智能匹配,即$_ ~~ EXPR (See Smartmatch Operator in perlop for more information on smartmatching.) (有关智能匹配的更多信息,请参阅perlop 中的智能匹配运算符。)

“Smartmatch Operator” in perlop gives a table of many combinations you can use, and the above code corresponds to the case where $a is Any and $b is Array , which corresponds roughly to perlop中的“Smartmatch Operator”给出了一张你可以使用的多种组合的表格,上面的代码对应$a$aAny$bArray 的情况,大致对应于

grep $a ~~ $_, @$b

except the search short-circuits, ie , returns quickly on a match rather than processing all elements.除了搜索短路,在匹配时快速返回而不是处理所有元素。 In the implicit loop then, we're smart matching Any against Regex , which is然后在隐式循环中,我们将AnyRegex智能匹配,即

$a =~ /$b/

Output:输出:

blurfl: miss.
bar: hit!
quux: miss.
foo: hit!
baz: hit!

Addendum附录

Since this answer was originally written, Perl's designers have realized there were mistakes in the way smartmatching works, and so it is now considered an experimental feature .由于这个答案最初是写出来的,Perl 的设计者已经意识到智能匹配的工作方式存在错误,所以现在它被认为是一个实验性的特性 The case used above is not one of the controversial uses, nonetheless the code's output would include given is experimental and when is experimental except that I added no warnings 'experimental::smartmatch';上面使用的案例不是有争议的用途之一,尽管如此,代码的输出将包括given is experimentalwhen is experimental除了我no warnings 'experimental::smartmatch';添加no warnings 'experimental::smartmatch'; . .

Any use of experimental features involves some risk, but I'd estimate it being low likelihood for this case.任何实验功能的使用都涉及一些风险,但我估计在这种情况下可能性很小。 When using code similar to the above and upgrading to a newer version of Perl, this is a potential gotcha to be aware of.当使用与上述类似的代码并升级到更新版本的 Perl 时,这是一个需要注意的潜在问题。

From perlfaq6 's answer to How do I efficiently match many regular expressions at once?perlfaq6如何一次有效匹配多个正则表达式的回答 , in this case the latest development version that I just updated with a smart match example. ,在这种情况下,我刚刚使用智能匹配示例更新的最新开发版本。


How do I efficiently match many regular expressions at once?如何一次有效地匹配多个正则表达式?

(contributed by brian d foy) (由布赖恩 d foy 提供)

If you have Perl 5.10 or later, this is almost trivial.如果您有 Perl 5.10 或更高版本,这几乎是微不足道的。 You just smart match against an array of regular expression objects:您只需对正则表达式对象数组进行智能匹配:

my @patterns = ( qr/Fr.d/, qr/B.rn.y/, qr/W.lm./ );

if( $string ~~ @patterns ) {
    ...
    };

The smart match stops when it finds a match, so it doesn't have to try every expression.智能匹配在找到匹配项时停止,因此不必尝试每个表达式。

Earlier than Perl 5.10, you have a bit of work to do.在 Perl 5.10 之前,您还有一些工作要做。 You want to avoid compiling a regular expression every time you want to match it.您希望避免每次要匹配正则表达式时都编译它。 In this example, perl must recompile the regular expression for every iteration of the C loop since it has no way to know what C will be:在这个例子中,perl 必须为 C 循环的每次迭代重新编译正则表达式,因为它无法知道 C 将是什么:

my @patterns = qw( foo bar baz );

LINE: while( <DATA> ) {
    foreach $pattern ( @patterns ) {
        if( /\b$pattern\b/i ) {
            print;
            next LINE;
            }
        }
    }

The C operator showed up in perl 5.005. C 运算符出现在 perl 5.005 中。 It compiles a regular expression, but doesn't apply it.它编译一个正则表达式,但不应用它。 When you use the pre-compiled version of the regex, perl does less work.当您使用正则表达式的预编译版本时,perl 的工作量会减少。 In this example, I inserted a C to turn each pattern into its pre-compiled form.在这个例子中,我插入了一个 C 来将每个模式转换成它的预编译形式。 The rest of the script is the same, but faster:脚本的其余部分相同,但速度更快:

my @patterns = map { qr/\b$_\b/i } qw( foo bar baz );

LINE: while( <> ) {
    foreach $pattern ( @patterns ) {
        if( /$pattern/ )
            {
            print;
            next LINE;
            }
        }
    }

In some cases, you may be able to make several patterns into a single regular expression.在某些情况下,您可以将多个模式组合成一个正则表达式。 Beware of situations that require backtracking though.但要注意需要回溯的情况。

my $regex = join '|', qw( foo bar baz );

LINE: while( <> ) {
    print if /\b(?:$regex)\b/i;
    }

For more details on regular expression efficiency, see I by Jeffrey Freidl.有关正则表达式效率的更多详细信息,请参阅 Jeffrey Freidl 的 I。 He explains how regular expressions engine work and why some patterns are surprisingly inefficient.他解释了正则表达式引擎的工作原理以及为什么有些模式效率低得惊人。 Once you understand how perl applies regular expressions, you can tune them for individual situations.一旦您了解了 perl 如何应用正则表达式,您就可以针对个别情况调整它们。

My go-to for testing a value against multiple regexes at once is Regexp::Assemble , which will "Assemble multiple Regular Expressions into a single RE" in a manner somewhat more intelligent and optimized than simply doing a join '|', @regexps .我一次针对多个正则表达式测试一个值的方法是Regexp::Assemble ,它将“将多个正则表达式组合成一个正则表达式”,其方式比简单地进行join '|', @regexps更智能和优化. You are also able, by default, to retrieve the portion of the text which matched and, if you need to know which pattern matched, the track switch will provide that information.默认情况下,您还可以检索匹配的文本部分,如果您需要知道哪个模式匹配, track开关将提供该信息。 Its performance is quite good - in one application, I'm using it to test against 1700 patterns at once - and I have yet to need anything that it doesn't do.它的性能非常好——在一个应用程序中,我用它一次测试了 1700 个模式——我还没有需要它不做的任何事情。

I'm not exactly sure what you are looking for, but something like that?我不确定你在找什么,但类似的东西?

#!/usr/bin/perl
@regexes = ( qr/foo/ , qr/bar/ );
while ($line=<>){
  chomp $line;
  $match=0;
  for $re (@regexes){
    $match++ if ($line =~ $re);
  }
  print "$line matches $match regexes\n";
}

You could also compile all of them into a single reges like this:你也可以像这样将它们全部编译成一个单一的reges:

#!/usr/bin/perl
@regexes = ( qr/foo/ , qr/bar/ );
$allre= "(".join("|",@regexes).")";
$compiled=qr/$allre/;
while(<>){
  chomp;
  print "$_ matches ($1)\n" if /$compiled/;
}

Hope that helps.希望有帮助。

If using a large number of regexps, you might be interested in Regexp::Optimizer如果使用大量正则表达式,您可能对Regexp::Optimizer感兴趣

See from the synopsis section :从概要部分看到:

use Regexp::Optimizer;
my $o  = Regexp::Optimizer->new;
my $re = $o->optimize(qr/foobar|fooxar|foozap/);
# $re is now qr/foo(?:[bx]ar|zap)/

That might be more efficient, if you're willing to install an extra module.如果您愿意安装额外的模块,那可能会更有效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM