Pre: I'm trying to extract different types of parts
from a big array using regexp. This operation is performed in AsyncTask
. part.plainname
is a string, 256 char maximum. item_pattern
looks like "^keyword.*?$"
Problem: I found the method, that's slows everything:
public int defineItemAmount(NFItem[] parts, String item_pattern){
System.out.println("STAMP2");
int casecount = 0;
for (NFItem part : parts) {
if (testItem(part.plainname, item_pattern))
++casecount;
}
System.out.println("STAMP3");
return casecount;
}
public boolean testItem(String testString, String item_pattern){
Pattern p = Pattern.compile(item_pattern);
Matcher m = p.matcher(testString);
return m.matches();
}
There's only 950 parts
, but it works horribly slow:
02-25 11:34:51.773 1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP2
02-25 11:35:18.094 1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP3
20 seconds just for the counting. testItem
is used a lot, around 15* parts
. So the whole app is working more than 15 minutes. While almost the same java program (not for android app) finishes in less than 30 seconds.
Question: What am I doing wrong? Why simple regexp operationg taking so long?
You can pre-compile the pattern:
public static int defineItemAmount(NFItem[] parts, String item_pattern){
System.out.println("STAMP2");
Pattern pattern = Pattern.compile(item_pattern);
int casecount = 0;
for (NFItem part : parts) {
if (testItem(part.plainname, pattern))
++casecount;
}
System.out.println("STAMP3");
return casecount;
}
public static boolean testItem(String testString, Pattern pattern){
Matcher m = pattern.matcher(testString);
return m.matches();
}
If you are looking for a string that begins with a keyword, you don't need to use the matches
method with this kind of pattern ^keyword.*?$
:
matches
method is by default anchored, anchors are not needed, you can remove them. lookingAt
method is more appropriate since it doesn't care of what happens at the end of the string. keyword
is a literal string and not a subpattern, don't use regex at all and use indexOf
to check if the keyword is at the index 0. You don't need to compile the pattern each time. Rather, do it once on initialisation.
But, due to their generality, regular expressions are not fast, and they are not designed to be. You might be better off using a specific string splitting technique if the data are sufficiently regular.
Regexes are usually slow because they have a lot of things (like synchronization ) involved in their construction.
Don't call a separate method in the loop (which might prevent certain optimizations). Let the VM optimize the for loop. Use this and check performance :
Pattern p = Pattern.compile(item_pattern); // compile pattern only once for (NFItem part : parts) { if (testItem(part.plainname, item_pattern)) ++casecount; } Matcher m = p.matcher(testString); boolean b = m.matches(); ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.