简体   繁体   English

Pattern.compile缓存吗?

[英]Does Pattern.compile cache?

它可能是一个实现细节,但对于Oracle和IBM JDK,至少是缓存的编译模式还是我们应用程序开发人员需要自己执行编译模式的缓存?

As far as I know from looking at the code (JDK 6) it doesn't do caching but once constructed, Pattern object could be cached on application side and shared among multiple threads. 据我所知,从查看代码(JDK 6)开始,它不进行缓存,但一旦构造,Pattern对象可以缓存在应用程序端并在多个线程之间共享。 Standard pattern seems to be to assign it to final static variable: 标准模式似乎是将其分配给最终的静态变量:

private static final Pattern p = Pattern.compile(",");

I don't believe the results are cached and there's no evidence of such behaviour in the code or the documentation . 我不相信结果是缓存的,并且代码文档中没有这种行为的证据。 It would (of course) be relatively trivial to implement such a cache yourself, but I would be interested in a use case in which such caching is beneficial. (当然)自己实现这样的缓存会是相对微不足道的,但我会对这种缓存有益的用例感兴趣。

Re. 回覆。 the comment below and String.split() , there's a different approach in that the code takes a distinct path for trivial 1 or 2 char patterns vs more complex regexps. 下面的注释和String.split() ,有一种不同的方法,代码为简单的1或2个char模式与更复杂的regexp采用不同的路径。 But it still doesn't appear to cache. 但它似乎仍然没有缓存。

I've created a class CachedPattern that can cache Pattern objects. 我创建了一个可以缓存Pattern对象的类CachedPattern If you run the main method you'll see that Java's Pattern objects are in fact different instances, which also consumes memory. 如果运行main方法,您将看到Java的Pattern对象实际上是不同的实例,这也会消耗内存。

import java.util.HashMap;
import java.util.regex.Pattern;
import org.eclipse.core.runtime.Assert;

public class CachedPattern {

public static void main(String[] args){
    Pattern p1 = Pattern.compile("abc");
    Pattern p2 = Pattern.compile("abc");
    Pattern p3 = Pattern.compile("abc");
    Pattern p4 = Pattern.compile("abc");
    Pattern p5 = Pattern.compile("abc");

    Pattern x1 =  CachedPattern.compile("abc");
    Pattern x2 =  CachedPattern.compile("abc");
    Pattern x3 =  CachedPattern.compile("abc");
    Pattern x4 =  CachedPattern.compile("abc");
    Pattern x5 =  CachedPattern.compile("abc");
    // are cached objects the same ? YES!
    Assert.isTrue(x1.equals(x2));
    Assert.isTrue(x1.equals(x3));
    Assert.isTrue(x1.equals(x4));
    Assert.isTrue(x1.equals(x5));
    // are non-cached objects the same ? NO!
    Assert.isTrue(p1.equals(p2)); //AssertionFailedException
}

 private static HashMap<String, Pattern> cached = new HashMap<>();

 /**
  * This value must be unique, to make sure user won't use this inside "regex" variable,
  * so that objects without flags would be returned
  * For example if UNIQUE_HASH would be empty:
  *     compile(pattern = "abc1")
  *          VS.
  *     compile(pattern = "abc", flag = 1)
  * This would give same keys "abc1" and "abc1"
  */
 private static final String UNIQUE_HASH = "(())[]+@#$%^@!@#$%*";

 public static Pattern compile(String regex){
     if(cached.containsKey(regex)){
         return cached.get(regex);
     }
     Pattern p = Pattern.compile(regex);
     cached.put(regex, p);
     return p;
 }
 public static Pattern compile(String regex, int flags){
     String uniqueKey = regex + UNIQUE_HASH + flags;
     if(cached.containsKey(uniqueKey)){
         return cached.get(uniqueKey);
     }
     Pattern p = Pattern.compile(regex);
     cached.put(uniqueKey, p);
     return p;
 }

}

It doesn't. 它没有。 If you have performance sensitive areas, you might want to hold your pattern objects as member variables. 如果您有性能敏感区域,则可能希望将模式对象保存为成员变量。

Clojure does this more or less automatically when you have a regex in a function though. 当你在函数中使用正则表达式时,Clojure会或多或少地自动执行此操作。

According to [Joshua_Bloch] Effective_Java : 根据[Joshua_Bloch] Effective_Java

Some object creations are much more expensive than others. 一些对象创建比其他对象更昂贵。 If you're going to need such an “expensive object” repeatedly, it may be advisable to cache it for reuse. 如果您反复需要这样一个“昂贵的对象”,建议将其缓存以便重复使用。 Unfortunately, it's not always obvious when you're creating such an object. 不幸的是,当你创建这样一个对象时,并不总是很明显。 Suppose you want to write a method to determine whether a string is a valid Roman numeral. 假设您要编写一个方法来确定字符串是否是有效的罗马数字。 Here's the easiest way to do this using a regular expression: 这是使用正则表达式执行此操作的最简单方法:

// Performance can be greatly improved!
static boolean isRomanNumeral(String s) {
return s.matches("^(?=.)M*(C[MD]|D?C{0,3})"
+ "(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$");
}

The problem with this implementation is that it relies on the String.matches method. 此实现的问题在于它依赖于String.matches方法。 While String.matches is the easiest way to check if a string matches a regular expression, it's not suitable for repeated use in performance-critical situations. 虽然String.matches是检查字符串是否与正则表达式匹配的最简单方法,但它不适合在性能关键的情况下重复使用。 The problem is that it internally creates a Pattern instance for the regular expression and uses it only once, after which it becomes eligible for garbage collection. 问题是它在内部为正则表达式创建了一个Pattern实例,并且只使用它一次,之后它就有资格进行垃圾回收。 Creating a Pattern instance is expensive because it requires compiling the regular expression into a finite state machine. 创建Pattern实例很昂贵,因为它需要将正则表达式编译为有限状态机。 To improve the performance, explicitly compile the regular expression into a Pattern instance (which is immutable) as part of class initialization, cache it, and reuse the same instance for every invocation of the isRomanNumeral method: 要提高性能,请将正则表达式显式编译为Pattern实例(不可变)作为类初始化的一部分,对其进行缓存,并在每次调用isRomanNumeral方法时重用相同的实例:

// Reusing expensive object for improved performance
public class RomanNumerals {
private static final Pattern ROMAN = Pattern.compile(
"^(?=.)M*(C[MD]|D?C{0,3})"
+ "(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$");
static boolean isRomanNumeral(String s) {
return ROMAN.matcher(s).matches();
}}

The improved version of isRomanNumeral provides significant performance gains if invoked frequently. 如果经常调用,改进版本的isRomanNumeral可以显着提高性能。 On my machine, the original version takes 1.1 μs on an 8-character input string, while the improved version takes 0.17 μs, which is 6.5 times faster 在我的机器上,原始版本在8个字符的输入字符串上需要1.1μs,而改进版本需要0.17μs,这是6.5倍的速度

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM