Java中的正则表达式匹配算法

Question

This article says that regexp matching in Java is slow because regexps with "back references" cannot be matched efficiently. 本文说Java中的regexp匹配很慢，因为具有“后向引用”的正则表达式无法有效匹配。 The article explains efficient Thomson's NFA-based matching algorithm (invented in 1968) which works for regexps without "back references". 这篇文章解释了有效的 Thomson基于NFA的匹配算法（发明于1968年），该算法适用于没有 “反向引用”的正则表达式。 However the Pattern javadoc says Java regexps use NFA-based approach. 然而， Pattern javadoc说Java regexps使用基于NFA的方法。

Now I wonder how efficient Java regexp matching is and what algorithm it uses. 现在我想知道Java regexp匹配的效率如何以及它使用的算法。

Answer 1

java.util.regex.Pattern uses Boyer–Moore string search algorithm java.util.regex.Pattern使用Boyer-Moore字符串搜索算法

/* Attempts to match a slice in the input using the Boyer-Moore string
 * matching algorithm. The algorithm is based on the idea that the
 * pattern can be shifted farther ahead in the search text if it is
 * matched right to left.
 */

private void compile() {
    ----------------------
    -----------------------

   if (matchRoot instanceof Slice) {
        root = BnM.optimize(matchRoot);
        if (root == matchRoot) {
            root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
        }
    } else if (matchRoot instanceof Begin || matchRoot instanceof First) {
        root = matchRoot;
    } else {
        root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
    }
}

Java中的正则表达式匹配算法

问题描述

1 个解决方案

解决方案1
1 2013-10-08 15:32:45

Java中的正则表达式匹配算法

问题描述

1 个解决方案

解决方案1 1 2013-10-08 15:32:45

解决方案1
1 2013-10-08 15:32:45