Pooling issue: Item borrowed more than once

Question

I have a utility method (= a static one) that I call a lot, which uses a java.util.regex.Matcher . As the regex passed is reused a lot, I try not to compile it every time, so I keep it in a Map where the key is the regex, and the value is a List of Matcher objects (so that each thread gets it own Matcher instance).

How is it that the following code snippet manages to return the same Matcher twice... sometimes?

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Queue;
import java.util.Set;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class MyTest {

    private static final Map<String, Queue<Matcher>> matchers = new HashMap<String, Queue<Matcher>>();
    private static Set<Integer> duplicateHunter = new HashSet<Integer>();

    private static Matcher getMatcher(String regexp, String value) {
        Queue<Matcher> matcherQueue = matchers.get(regexp);

        if (matcherQueue == null) {
            synchronized (matchers) {
                matcherQueue = matchers.get(regexp);

                if (matcherQueue == null) {
                    // Create a new thread-safe Queue and a new Matcher
                    matcherQueue = new ConcurrentLinkedQueue<Matcher>();
                    matchers.put(regexp, matcherQueue);
                } // Else: another thread already did what needed to be done
            }
        }

        // Try to retrieve a Matcher
        Matcher matcher = matcherQueue.poll();

        if (matcher == null) {
            // No matchers available, create one
            // No lock needed, as it's not a drama to have a few more matchers in the queue
            Pattern pattern = Pattern.compile(regexp);
            matcher = pattern.matcher(value);
            matcherQueue.offer(matcher);
        } else {
            // reset the matcher
            matcher.reset(value);
        }

//        boolean notADuplicate = duplicateHunter.add(matcher.hashCode());
//        if(!notADuplicate) {
//            throw new RuntimeException("DUPLICATE!!!");
//        }

        return matcher;
    }

    private static void returnMatcher(String regex, Matcher matcher) {
        Queue<Matcher> matcherQueue = matchers.get(regex);
        //duplicateHunter.remove(matcher.hashCode());
        matcherQueue.offer(matcher);
    }

    public static void main(String[] args) {

        for (int i = 0; i < 2; i++) {
            Thread thread = new Thread(new Runnable() {

                public void run() {
                    for (int i = 0; i < 50000; i++) {
                        String regex = ".*";
                        Matcher matcher = null;
                        try {
                            matcher = getMatcher(regex, "toto" + i);
                            if (matcher.matches()) {
                                // matches
                            }
                        } finally {
                            if (matcher != null) {
                                returnMatcher(regex, matcher);
                            }
                        }
                    }


                }
            });

            thread.start();
        }

    }
}

You'll get a " java.lang.StringIndexOutOfBoundsException : String index out of range". Enable the duplicateHunter code and you'll see that a Matcher is indeed returned twice sometimes.

(The static utility method isn't shown, the main method was made to show you the problem)

Answer 1

When there are no matchers for a regexp, you create a new matcher, but you also add it to the queue right away:

if (matcher == null) {
    // No matchers available, create one
    // No lock needed, as it's not a drama to have a few more matchers in the queue
    Pattern pattern = Pattern.compile(regexp);
    matcher = pattern.matcher(value);
    matcherQueue.offer(matcher); // Don't add it to the queue here!
}

Thus it will be in the queue while you are using it, and another thread could easily get a hold of it before you are done.

I don't know if I agree with your idea of pooling matchers by the way. They are not very expensive to create in terms of CPU cycles. You probably want to profile it to see if it's worth it. Precompiling the Pattern is a good idea, however.

Answer 2

When you create a new Matcher, you offer it to the Queue before returning it, so the next thread gets it right away.

matcher = pattern.matcher(value);  
matcherQueue.offer(matcher);        // <-- this line should be taken taken out and shot

...

return matcher;

Also, your duplicateHunter HashSet is not thread safe and may give you the wrong results when validating against it.

Pooling issue: Item borrowed more than once

Question

2 answers

solution1
4 ACCPTED 2012-01-19 22:16:59

solution2
1 2012-01-19 22:18:08

Pooling issue: Item borrowed more than once

Question

2 answers

solution1 4 ACCPTED 2012-01-19 22:16:59

solution2 1 2012-01-19 22:18:08

solution1
4 ACCPTED 2012-01-19 22:16:59

solution2
1 2012-01-19 22:18:08