简体   繁体   English

多模式匹配算法

[英]Multiple Pattern Match Algorithm

I have lot of logs and every record contains a url. 我有很多日志,每条记录都包含一个网址。 And I have about 2000+ url patterns to filter the log. 我有大约2000多个url模式来过滤日志。 Some patterns are regular pattern with capturable group. 一些模式是具有可捕获组的常规模式。 I want to get url and the matched pattern and, if possible, the captured groupes. 我想得到网址和匹配的模式,如果可能的话,我想获得捕获的组合。 Is there a java lib can help me. 有没有一个java lib可以帮助我。 Or any Algorithm which can solve my problem. 或任何可以解决我的问题的算法。 Or anyting else which related to my problem. 或者与我的问题有关的其他任何事情。 Thanks a lot. 非常感谢。

Take a look at java regular expressions library ( link ). 看一下java正则表达式库( 链接 )。

You can construct a single large pattern by concatenating your original patterns with | 您可以通过将原始模式与|连接来构建单个大型模式 between them (use () to specify that you don't want just 1 character). 它们之间(use ()指定你不只想要1个字符)。

The regular expression can be compiled into an efficient matching finite automata, that you can run over your data. 正则表达式可以编译成一个有效的匹配有限自动机,您可以运行您的数据。 Just make sure you compile it once and reuse it for every record. 只需确保编译一次并将其重复用于每条记录。

It will handle extracting groups, but you need to handle the groups in a generic way (since any group can be matched). 它将处理提取组,但您需要以通用方式处理组(因为任何组都可以匹配)。 If it makes it easier consider using named groups to make handling simpler. 如果它更容易考虑使用命名组来简化处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM