I have a utility method (= a static
one) that I call a lot, which uses a java.util.regex.Matcher
. As the regex passed is reused a lot, I try not to compile it every time, so I keep it in a Map
where the key is the regex, and the value is a List
of Matcher
objects (so that each thread gets it own Matcher
instance).
How is it that the following code snippet manages to return the same Matcher
twice... sometimes?
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Queue;
import java.util.Set;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class MyTest {
private static final Map<String, Queue<Matcher>> matchers = new HashMap<String, Queue<Matcher>>();
private static Set<Integer> duplicateHunter = new HashSet<Integer>();
private static Matcher getMatcher(String regexp, String value) {
Queue<Matcher> matcherQueue = matchers.get(regexp);
if (matcherQueue == null) {
synchronized (matchers) {
matcherQueue = matchers.get(regexp);
if (matcherQueue == null) {
// Create a new thread-safe Queue and a new Matcher
matcherQueue = new ConcurrentLinkedQueue<Matcher>();
matchers.put(regexp, matcherQueue);
} // Else: another thread already did what needed to be done
}
}
// Try to retrieve a Matcher
Matcher matcher = matcherQueue.poll();
if (matcher == null) {
// No matchers available, create one
// No lock needed, as it's not a drama to have a few more matchers in the queue
Pattern pattern = Pattern.compile(regexp);
matcher = pattern.matcher(value);
matcherQueue.offer(matcher);
} else {
// reset the matcher
matcher.reset(value);
}
// boolean notADuplicate = duplicateHunter.add(matcher.hashCode());
// if(!notADuplicate) {
// throw new RuntimeException("DUPLICATE!!!");
// }
return matcher;
}
private static void returnMatcher(String regex, Matcher matcher) {
Queue<Matcher> matcherQueue = matchers.get(regex);
//duplicateHunter.remove(matcher.hashCode());
matcherQueue.offer(matcher);
}
public static void main(String[] args) {
for (int i = 0; i < 2; i++) {
Thread thread = new Thread(new Runnable() {
public void run() {
for (int i = 0; i < 50000; i++) {
String regex = ".*";
Matcher matcher = null;
try {
matcher = getMatcher(regex, "toto" + i);
if (matcher.matches()) {
// matches
}
} finally {
if (matcher != null) {
returnMatcher(regex, matcher);
}
}
}
}
});
thread.start();
}
}
}
You'll get a " java.lang.StringIndexOutOfBoundsException
: String index out of range". Enable the duplicateHunter
code and you'll see that a Matcher
is indeed returned twice sometimes.
(The static
utility method isn't shown, the main
method was made to show you the problem)
When there are no matchers for a regexp, you create a new matcher, but you also add it to the queue right away:
if (matcher == null) {
// No matchers available, create one
// No lock needed, as it's not a drama to have a few more matchers in the queue
Pattern pattern = Pattern.compile(regexp);
matcher = pattern.matcher(value);
matcherQueue.offer(matcher); // Don't add it to the queue here!
}
Thus it will be in the queue while you are using it, and another thread could easily get a hold of it before you are done.
I don't know if I agree with your idea of pooling matchers by the way. They are not very expensive to create in terms of CPU cycles. You probably want to profile it to see if it's worth it. Precompiling the Pattern
is a good idea, however.
When you create a new Matcher, you offer it to the Queue before returning it, so the next thread gets it right away.
matcher = pattern.matcher(value);
matcherQueue.offer(matcher); // <-- this line should be taken taken out and shot
...
return matcher;
Also, your duplicateHunter HashSet is not thread safe and may give you the wrong results when validating against it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.