Tracking concurrent file processing in Java web app

Question

I have a Java 1.5 web application that converts arbitrary PDF files to images. It takes too long to process all pages of even a single PDF in one shot, so I want to process pages on demand.

I've read that I can use an ExecutorService to launch/queue the image generation operation in a new thread as the HTTP requests for particular pages arrive. How do I ensure that I'm not queueing duplicate operations (eg, two users request the same page from the same PDF) without resorting to a single thread executor? How can I use something like a synchronized list to track which images the worker threads are processing (or, what type of synchronization mechanism can help me track this)?

Answer 1

You can use a ConcurrentSkipListSet or ConcurrentHashMap to track which PDFs have been processed (and are presumably cached) or are currently being processed. Use a ConcurrentLinkedQueue for your PDF-to-image requests; when a worker thread pulls a request off of the queue it adds it to the Set/Map, if the add succeeds then the thread processes the request, if the add fails then the request was already in the container.

Answer 2

You could use a ConcurrentHashMap<String, Future<String>> with a PDF identifier (eg file path or so) as the key and a task representing the conversion operation itself as the value.

The putIfAbsent method of ConcurrentHashMap can deal with the question of compare-and-set operation and the isDone method of Future can indicate whether the conversion has finished or not.

When putIfAbsent returns null , it means that the conversion task for a given PDF did not yet exist, thus you need to invoke ExecutorService.submit(Callable<T> task) to fire up your newly created conversion task; otherwise you omit this step and wait for the already existing task to finish.

Mockup:

Future<String> conversionTask = ... // blah
Future<String> existingTask = conversions.putIfAbsent(pdfId, conversionTask);
if (existingTask != null) {
    conversionTask = existingTask;
}
// Either way, conversion is scheduled by now.

The ExecutorService takes care of queueing your conversion requests.

Once a conversion completes, you can retrieve the result via Future<V>.get() method.

Please note that spawning threads within a Java EE application is not permitted by the specification. A common approach is to separate your asynchronous processing as a JMS service - Apache Camel can help you here.

Tracking concurrent file processing in Java web app

Question

2 answers

solution1
1 2013-04-14 01:20:22

solution2
1 ACCPTED 2013-04-14 09:31:38

Tracking concurrent file processing in Java web app

Question

2 answers

solution1 1 2013-04-14 01:20:22

solution2 1 ACCPTED 2013-04-14 09:31:38

solution1
1 2013-04-14 01:20:22

solution2
1 ACCPTED 2013-04-14 09:31:38