简体   繁体   中英

Regular expressions are extremly slow

Pre: I'm trying to extract different types of parts from a big array using regexp. This operation is performed in AsyncTask . part.plainname is a string, 256 char maximum. item_pattern looks like "^keyword.*?$"

Problem: I found the method, that's slows everything:

public int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, item_pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public boolean testItem(String testString, String item_pattern){
    Pattern p = Pattern.compile(item_pattern);
    Matcher m = p.matcher(testString);
    return m.matches();
}

There's only 950 parts , but it works horribly slow:

02-25 11:34:51.773    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP2

02-25 11:35:18.094    1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP3

20 seconds just for the counting. testItem is used a lot, around 15* parts . So the whole app is working more than 15 minutes. While almost the same java program (not for android app) finishes in less than 30 seconds.

Question: What am I doing wrong? Why simple regexp operationg taking so long?

You can pre-compile the pattern:

public static int defineItemAmount(NFItem[] parts, String item_pattern){
    System.out.println("STAMP2");
    Pattern pattern = Pattern.compile(item_pattern);
    int casecount = 0;
    for (NFItem part : parts) {
        if (testItem(part.plainname, pattern))
            ++casecount;
    }
    System.out.println("STAMP3");
    return casecount;
}

public static boolean testItem(String testString, Pattern pattern){
    Matcher m = pattern.matcher(testString);
    return m.matches();
}

If you are looking for a string that begins with a keyword, you don't need to use the matches method with this kind of pattern ^keyword.*?$ :

  • first the non-greedy quantifier is useless and may slowdown the regex engine for nothing, a greedy quantifier will give you the same result.
  • since the matches method is by default anchored, anchors are not needed, you can remove them.
  • you are only interested by the begining of the string, so in this case the lookingAt method is more appropriate since it doesn't care of what happens at the end of the string.
  • as other answers notice it, if the same pattern is used several times, try to compile it once and for all outside the testItem function. But if it isn't the case don't compile it at all.
  • if keyword is a literal string and not a subpattern, don't use regex at all and use indexOf to check if the keyword is at the index 0.

You don't need to compile the pattern each time. Rather, do it once on initialisation.

But, due to their generality, regular expressions are not fast, and they are not designed to be. You might be better off using a specific string splitting technique if the data are sufficiently regular.

  1. Regexes are usually slow because they have a lot of things (like synchronization ) involved in their construction.

  2. Don't call a separate method in the loop (which might prevent certain optimizations). Let the VM optimize the for loop. Use this and check performance :

      Pattern p = Pattern.compile(item_pattern); // compile pattern only once for (NFItem part : parts) { if (testItem(part.plainname, item_pattern)) ++casecount; } Matcher m = p.matcher(testString); boolean b = m.matches(); ... 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM