简体   繁体   中英

Java String.split pass in precompiled regex for performance reasons

As the question states given the following code:

public class Foo
{
   public static void main(String[] args)
   {  
         String test = "Cats go meow";  
         String[] tokens = test.split(" ");
   }
}

is it possible to precompile that regex in the split function along the lines of this:

public class Foo
{  
   Pattern pattern = Pattern.compile(" ");
   public static void main(String[] args)
   {  
         String test = "Cats go meow";  
         String[] tokens = test.split(pattern);
   }
}

Yes, it is possible. Also, make pattern static so the static method main can access it.

public class Foo
{  
   private static Pattern pattern = Pattern.compile(" ");
   public static void main(String[] args)
   {  
         String test = "Cats go meow";  
         String[] tokens = pattern.split(test);
   }
}

According to the docs for the split method in String, you can use String's split or Pattern's split , but String's split compiles a Pattern and calls its split method, so use Pattern to precompile a regex.

public class Foo
{  
   private static final Pattern pattern = Pattern.compile(" ");
   public static void main(String[] args)
   {  
         String test = "Cats go meow";  
         String[] tokens = pattern.split(test);
   }
}

No - I think that would be a bad idea!

Looking closely at the source code of the split-method - there is a shortcut implemented in case the string is only of one character (and does not contain a regex-special character)

public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.value.length == 1 &&
         ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||

so - split(" ") should be a lot faster.

On the other hand when using regexes it is always a good idea to make them static final members.

edit:

The source code JDK1.7 and OpenJDK 7 seems to be identical for String.split - have a look yourselves: Lines 2312ff.

So - for more complicated patterns (1 or more spaces for instance):

   static final Pattern pSpaces = Pattern.compile("[ ]+");

使用Pattern.split()代替:

String[] tokens = pattern.split(test);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM