简体   繁体   中英

High performance simple Java regular expressions

Part of the code I'm working on uses a bunch of regular expressions to search for some simple string patterns (eg, patterns like "foo[0-9]{3,4} bar"). Currently, we use statically-compiled Java Patterns and then call Pattern#matcher to check whether a string has contains a match to the pattern (I don't need the match, just a boolean indicating whether there is a match). This is causing a noticeable amount of memory allocation that is affecting performance.

Is there a better option for Java regex matching that is faster or at least doesn't allocate memory every time it searches a string for a pattern?

尝试使用matcher.reset("newinputtext")方法,以避免每次调用Pattern.matcher时都创建新的匹配器。

If you expect less than 50% of lines matching your regex, you can first try to test for some subsequence via String.indexOf() which is about 3 to 20 times faster for simple sequence compared to regex matcher:

if (line.indexOf("foo")>-1) && pattern.matcher(line).matches()) {
    ...

If you add to your code such heuristics, remember to always well document them, and verify using profiler that code is indeed faster compared to simple code.

If you want to avoid creating a new Matcher for each Pattern, use the usePattern() method, like so:

Pattern[] pats = {
  Pattern.compile("123"),
  Pattern.compile("abc"),
  Pattern.compile("foo")
};
String s = "123 abc";
Matcher m = Pattern.compile("dummy").matcher(s);
for (Pattern p : pats)
{
  System.out.printf("%s : %b%n", p.pattern(), m.reset().usePattern(p).find());
}

see the demo on Ideone

You have to use matcher's reset() method too, or find() will only search from the point where the previous match ended (assuming the match was successful).

You could try using the Pattern.matches() static method which would just return the boolean. That wouldn't return a Matcher object so it could help with the memory allocation issues.

That being said the regex pattern would not be precompiled so it would be a performance vs resources thing at the point.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM