简体   繁体   English

基于DFA的Java正则表达式引擎与Capture

[英]DFA Based Regular Expression Engines for Java with Capture

Are there any (free) regular expression engines for Java, that can compile a regular expression to a DFA, and do group capturing while matching the DFA ? 是否有任何(免费)Java正则表达式引擎,可以将正则表达式编译为DFA,并在匹配DFA时进行组捕获?

I've found dk.brics.automaton and jrexx, which both compile to DFA, but neither seems to be able to do group capture. 我找到了dk.brics.automaton和jrexx,它们都编译成DFA,但似乎都无法进行群组捕获。 While the other engines I've found seem to compile to NFA. 虽然我发现的其他引擎似乎编译为NFA。

try this one (probably not DFA but faster than java.util) http://jregex.sourceforge.net/gstarted-advanced.html#ngroups , or this one: http://userguide.icu-project.org 尝试这个(可能不是DFA但比java.util快) http://jregex.sourceforge.net/gstarted-advanced.html#ngroups ,或者这个: http//userguide.icu-project.org

according to that test: http://tusker.org/regex/regex_benchmark.html , both are fast (we all know that the benchmarks only tests what the creator of the benchmark wanted to test). 根据该测试: http//tusker.org/regex/regex_benchmark.html ,两者都很快(我们都知道基准测试只测试基准测试的创建者想要测试的内容)。

When I needed really fast DFA regex I have spawned a process that used grep ;-) (For a 6GB log file it cut my times from 10minutes to a few seconds). 当我需要非常快速的DFA正则表达式时,我已经产生了一个使用grep的过程;-)(对于一个6GB的日志文件,它将我的时间从10分钟减少到几秒钟)。

我最近写了一篇: tree-regex

For C there is TRE and Google's RE2 libraries. 对于C,有TRE和Google的RE2库。 TRE uses DFA, RE2 uses NFA (as far as I understand), both could subgroup matching. TRE使用DFA,RE2使用NFA(据我所知),两者都可以进行子组匹配。 But I didn't see such a library for Java. 但我没有看到这样的Java库。

你可以试试Pat正则表达式库@ http://www.javaregex.com/

dk.brics.automaton is DFA does appear to do capturing groups. dk.brics.automaton是DFA似乎确实捕获组。 I expect that feature is new in the two years since this question. 我希望这个问题在这个问题的两年内是新的。 Check out class AutomatonMatcher. 查看类AutomatonMatcher。

See http://www.brics.dk/automaton/doc/dk/brics/automaton/AutomatonMatcher.html#group(int ) http://www.brics.dk/automaton/doc/dk/brics/automaton/AutomatonMatcher.html#group(int

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM