[英]Regular expressions are extremly slow
Pre: I'm trying to extract different types of parts
from a big array using regexp. 上一篇:我正尝试使用正则表达式从大型数组中提取不同类型的parts
。 This operation is performed in AsyncTask
. 此操作在AsyncTask
执行。 part.plainname
is a string, 256 char maximum. part.plainname
是一个字符串,最多256个字符。 item_pattern
looks like "^keyword.*?$"
item_pattern
看起来像"^keyword.*?$"
Problem: I found the method, that's slows everything: 问题:我找到了方法,这会使一切变慢:
public int defineItemAmount(NFItem[] parts, String item_pattern){
System.out.println("STAMP2");
int casecount = 0;
for (NFItem part : parts) {
if (testItem(part.plainname, item_pattern))
++casecount;
}
System.out.println("STAMP3");
return casecount;
}
public boolean testItem(String testString, String item_pattern){
Pattern p = Pattern.compile(item_pattern);
Matcher m = p.matcher(testString);
return m.matches();
}
There's only 950 parts
, but it works horribly slow: 只有950个parts
,但工作速度非常慢:
02-25 11:34:51.773 1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP2
02-25 11:35:18.094 1324-1343/com.nfe.unsert.dns_pc_creator I/System.out﹕ STAMP3
20 seconds just for the counting. 20秒仅用于计数。 testItem
is used a lot, around 15* parts
. testItem
使用,大约有15 *个parts
。 So the whole app is working more than 15 minutes. 因此,整个应用程序的工作时间超过15分钟。 While almost the same java program (not for android app) finishes in less than 30 seconds. 虽然几乎相同的Java程序(不适用于android应用)在30秒内完成。
Question: What am I doing wrong? 问题:我做错了什么? Why simple regexp operationg taking so long? 为什么简单的正则表达式操作要花这么长时间?
You can pre-compile the pattern: 您可以预编译模式:
public static int defineItemAmount(NFItem[] parts, String item_pattern){
System.out.println("STAMP2");
Pattern pattern = Pattern.compile(item_pattern);
int casecount = 0;
for (NFItem part : parts) {
if (testItem(part.plainname, pattern))
++casecount;
}
System.out.println("STAMP3");
return casecount;
}
public static boolean testItem(String testString, Pattern pattern){
Matcher m = pattern.matcher(testString);
return m.matches();
}
If you are looking for a string that begins with a keyword, you don't need to use the matches
method with this kind of pattern ^keyword.*?$
: 如果要查找以关键字开头的字符串,则无需将matches
方法与这种模式^keyword.*?$
:
matches
method is by default anchored, anchors are not needed, you can remove them. 由于默认情况下, matches
方法是锚定的,因此不需要锚定,因此可以将其删除。 lookingAt
method is more appropriate since it doesn't care of what happens at the end of the string. 您只对字符串的lookingAt
感兴趣,因此在这种情况下, lookingAt
方法更合适,因为它并不关心字符串末尾会发生什么。 keyword
is a literal string and not a subpattern, don't use regex at all and use indexOf
to check if the keyword is at the index 0. 如果keyword
是文字字符串而不是子模式,则不要使用正则表达式,而要使用indexOf
检查关键字是否在索引0处。 You don't need to compile the pattern each time. 您无需每次都编译模式。 Rather, do it once on initialisation. 而是在初始化时执行一次。
But, due to their generality, regular expressions are not fast, and they are not designed to be. 但是,由于它们的通用性,正则表达式并不是很快,而且它们也并非如此。 You might be better off using a specific string splitting technique if the data are sufficiently regular. 如果数据足够规则,则使用特定的字符串拆分技术可能会更好。
Regexes are usually slow because they have a lot of things (like synchronization ) involved in their construction. 正则表达式通常是缓慢的 ,因为他们有很多的参与他们建设的东西(如同步 )。
Don't call a separate method in the loop (which might prevent certain optimizations). 不要在循环中调用单独的方法(这可能会阻止某些优化)。 Let the VM optimize the for loop. 让虚拟机优化 for循环。 Use this and check performance : 使用它并检查性能:
Pattern p = Pattern.compile(item_pattern); // compile pattern only once for (NFItem part : parts) { if (testItem(part.plainname, item_pattern)) ++casecount; } Matcher m = p.matcher(testString); boolean b = m.matches(); ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.