简体   繁体   English

Java中的正则表达式匹配算法

[英]Regular expression matching algorithm in Java

This article says that regexp matching in Java is slow because regexps with "back references" cannot be matched efficiently. 本文说Java中的regexp匹配很慢,因为具有“后向引用”的正则表达式无法有效匹配。 The article explains efficient Thomson's NFA-based matching algorithm (invented in 1968) which works for regexps without "back references". 这篇文章解释了有效的 Thomson基于NFA的匹配算法(发明于1968年),该算法适用于没有 “反向引用”的正则表达式。 However the Pattern javadoc says Java regexps use NFA-based approach. 然而, Pattern javadoc说Java regexps使用基于NFA的方法。

Now I wonder how efficient Java regexp matching is and what algorithm it uses. 现在我想知道Java regexp匹配的效率如何以及它使用的算法。

java.util.regex.Pattern uses Boyer–Moore string search algorithm java.util.regex.Pattern使用Boyer-Moore字符串搜索算法

/* Attempts to match a slice in the input using the Boyer-Moore string
 * matching algorithm. The algorithm is based on the idea that the
 * pattern can be shifted farther ahead in the search text if it is
 * matched right to left.
 */

private void compile() {
    ----------------------
    -----------------------

   if (matchRoot instanceof Slice) {
        root = BnM.optimize(matchRoot);
        if (root == matchRoot) {
            root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
        }
    } else if (matchRoot instanceof Begin || matchRoot instanceof First) {
        root = matchRoot;
    } else {
        root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM