简体   繁体   English

如何在Perl中匹配多个正则表达式?

[英]How can I match against multiple regexes in Perl?

I've seen this previous post, about matching against multiple regexes How can I match against multiple regexes in Perl? 我已经看过上一篇文章,关于与多个正则表达式的匹配如何在Perl中匹配多个正则表达式?

I'm looking for the fastest way to match all the values contained in an array against a very big file (500 MB). 我正在寻找最快的方法来匹配数组中包含的所有值与一个非常大的文件(500 MB)。

The patterns are read from the stdin and may contain special characters that must be used in the regex (anchors, character classes etc). 模式是从标准输入读取的,可能包含必须在正则表达式中使用的特殊字符(锚点,字符类等)。 The match must happen when all the patterns are contained in the current row. 所有模式都包含在当前行中时,必须进行匹配。

Currently I'm using a nested for cycle but I'm not very satisfied with the speed.... 目前我正在使用嵌套的循环但我对速度不是很满意....

Thanks for your suggestions. 谢谢你的建议。

Try Regexp::Assemble as suggested in the post you linked to and compare that to an iterative approach like grep . 按照链接到的帖子中的建议尝试Regexp :: Assemble ,并将其与grep等迭代方法进行比较。 Regexp::Assemble should produce the fastest solution since Perl can optimize the joined regexes rather than scanning the whole line for each one. Regexp :: Assemble应该产生最快的解决方案,因为Perl可以优化连接的正则表达式而不是扫描每一个的整行。 Since you don't know your input beforehand, ymmv. 由于您事先不知道您的输入,ymmv。

Which version of Perl you're using will affect performance. 您使用的Perl版本会影响性能。 5.10 introduced a lot of optimizations for exactly this purpose (see " tries "). 5.10为此目的引入了许多优化(参见“ 尝试 ”)。 One of the biggest use cases is spam scanners like SpamAssassin which build a big regex of all the patterns they scan for, just like Regexp::Assemble. 其中一个最大的用例是垃圾邮件扫描程序,如SpamAssassin,它构建了所有扫描模式的大正则表达式,就像Regexp :: Assemble一样。

Finally, since your input is so large, it may be worthwhile to assemble the regex into a file and then run grep -P -f $regex_file $big_file . 最后,由于您的输入太大,将正则表达式组装到一个文件然后运行grep -P -f $regex_file $big_file可能是值得的。 -P tells grep to use Perl compatible regular expressions. -P告诉grep使用Perl兼容的正则表达式。 The file is used to avoid shell quoting or command size limits. 该文件用于避免shell引用或命令大小限制。 grep may blow the doors off Perl. grep可能会打开Perl的门。

In the end, you're going to have to do the benchmarking. 最后,您将不得不进行基准测试。

Did you try using grep? 你尝试过使用grep吗?

while($line=<>) {
    if (scalar(grep($line=~/$_/,@regexps))==scalar(@regexps)) {
       # ... All matched
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM