正则表达式非常慢

Question

上一篇：我正尝试使用正则表达式从大型数组中提取不同类型的parts 。 此操作在AsyncTask执行。 part.plainname是一个字符串，最多256个字符。 item_pattern看起来像"^keyword.*?$"

问题：我找到了方法，这会使一切变慢：

public int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, item_pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public boolean testItem(String testString, String item_pattern){
    Pattern p = Pattern.compile(item_pattern);
    Matcher m = p.matcher(testString);
    return m.matches();
}

只有950个parts ，但工作速度非常慢：

02-25 11:34:51.773    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP2

02-25 11:35:18.094    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP3

20秒仅用于计数。 testItem使用，大约有15 *个parts 。 因此，整个应用程序的工作时间超过15分钟。 虽然几乎相同的Java程序（不适用于android应用）在30秒内完成。

问题：我做错了什么？ 为什么简单的正则表达式操作要花这么长时间？

Answer 1

您可以预编译模式：

public static int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    Pattern pattern = Pattern.compile(item_pattern);
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public static boolean testItem(String testString, Pattern pattern){
    Matcher m = pattern.matcher(testString);
    return m.matches();
}

Answer 2

如果要查找以关键字开头的字符串，则无需将matches方法与这种模式^keyword.*?$ ：

首先，非贪婪的量词是无用的，并且可能使正则表达式引擎毫无作用地变慢，贪婪的量词会给您相同的结果。
由于默认情况下， matches方法是锚定的，因此不需要锚定，因此可以将其删除。
您只对字符串的lookingAt感兴趣，因此在这种情况下， lookingAt方法更合适，因为它并不关心字符串末尾会发生什么。
正如其他答案所指出的那样，如果多次使用同一模式，请尝试一次在testItem函数外部进行编译。 但是，如果不是这种情况，则根本不进行编译。
如果keyword是文字字符串而不是子模式，则不要使用正则表达式，而要使用indexOf检查关键字是否在索引0处。

Answer 3

您无需每次都编译模式。 而是在初始化时执行一次。

但是，由于它们的通用性，正则表达式并不是很快，而且它们也并非如此。 如果数据足够规则，则使用特定的字符串拆分技术可能会更好。

Answer 4

正则表达式通常是缓慢的 ，因为他们有很多的参与他们建设的东西（如同步）。

不要在循环中调用单独的方法（这可能会阻止某些优化）。 让虚拟机优化 for循环。 使用它并检查性能：

  Pattern p = Pattern.compile(item_pattern); // compile pattern only once for (NFItem part : parts) { if (testItem(part.plainname, item_pattern)) ++casecount; } Matcher m = p.matcher(testString); boolean b = m.matches(); ...

正则表达式非常慢

问题描述

4 个解决方案

解决方案1
1 已采纳 2015-02-25 12:36:57

解决方案2
1 2015-02-25 12:56:20

解决方案3
0 2015-02-25 12:37:06

解决方案4
0 2015-02-25 12:40:12

正则表达式非常慢

问题描述

4 个解决方案

解决方案1 1 已采纳 2015-02-25 12:36:57

解决方案2 1 2015-02-25 12:56:20

解决方案3 0 2015-02-25 12:37:06

解决方案4 0 2015-02-25 12:40:12

解决方案1
1 已采纳 2015-02-25 12:36:57

解决方案2
1 2015-02-25 12:56:20

解决方案3
0 2015-02-25 12:37:06

解决方案4
0 2015-02-25 12:40:12