Parallel database calls using Java 8 streams and CompletableFuture

Question

I would like to replicate and parallelize the following behavior with Java 8 streams:

for (animal : animalList) {
        // find all other animals with the same breed
        Collection<Animal> queryResult = queryDatabase(animal.getBreed());

        if (animal.getSpecie() == cat) {
            catList.addAll(queryResult);
        } else {
            dogList.addAll(queryResult);
        }
}

This is what I have so far

final Executor queryExecutor =
        Executors.newFixedThreadPool(Math.min(animalList.size(), 10),
                new ThreadFactory(){
                    public Thread newThread(Runnable r){
                        Thread t = new Thread(r);
                        t.setDaemon(true);
                        return t;
                    }
                });

List<CompletableFuture<Collection<Animal>>> listFutureResult =  animalList.stream()
        .map(animal -> CompletableFuture.supplyAsync(
                () -> queryDatabase(animal.getBreed()), queryExecutor))
        .collect(Collectors.toList());

List<Animal> = listFutureResult.stream()
        .map(CompletableFuture::join)
        .flatMap(subList -> subList.stream())
        .collect(Collectors.toList());

1 - I'm not sure how to split the stream so that I can get 2 different animal lists, one for cats and one for dogs.

2 - does this solution look reasonable?

Answer 1

First, consider just using

List<Animal> result = animalList.parallelStream()
    .flatMap(animal -> queryDatabase(animal.getBreed()).stream())
    .collect(Collectors.toList());

even if it won't give you the desired concurrency of up to ten. The simplicity might compensate it. Regarding the other part, it's as easy as

Map<Boolean,List<Animal>> result = animalList.parallelStream()
    .flatMap(animal -> queryDatabase(animal.getBreed()).stream())
    .collect(Collectors.partitioningBy(animal -> animal.getSpecie() == cat));
List<Animal> catList = result.get(true), dogList = result.get(false);

In case you have more species than just cats and dogs, you may use Collectors.groupingBy(Animal::getSpecie) to get a map from species to list of animals.

If you insist on using your own thread pool, a few things can be improved:

Executor queryExecutor = Executors.newFixedThreadPool(Math.min(animalList.size(), 10),
    r -> {
        Thread t = new Thread(r);
        t.setDaemon(true);
        return t;
    });
List<Animal> result =  animalList.stream()
    .map(animal -> CompletableFuture.completedFuture(animal.getBreed())
        .thenApplyAsync(breed -> queryDatabase(breed), queryExecutor))
    .collect(Collectors.toList()).stream()
    .flatMap(cf -> cf.join().stream())
    .collect(Collectors.toList());

Your supplyAsync variant required capturing the actual Animal instance, creating a new Supplier for each animal. In contrast, the function passed to thenApplyAsync is invariant, performing the same operation for each parameter value. The code above assumes that getBreed is a cheap operation, otherwise, it wouldn't be hard to pass the Animal instance to completedFuture and perform getBreed() with the async function instead.

The .map(CompletableFuture::join) can be replaced by a simple chained .join() within the flatMap function. Otherwise, if you prefer method references, you should use them consistently, ie .map(CompletableFuture::join).flatMap(Collection::stream) .

Of course, this variant also allows using partitioningBy instead of toList .

As a final note, if you invoke shutdown on the executor service after use, there is no need to mark the threads as daemon:

ExecutorService queryExecutor=Executors.newFixedThreadPool(Math.min(animalList.size(),10));
Map<Boolean,List<Animal>> result =  animalList.stream()
    .map(animal -> CompletableFuture.completedFuture(animal.getBreed())
        .thenApplyAsync(breed -> queryDatabase(breed), queryExecutor))
    .collect(Collectors.toList()).stream()
    .flatMap(cf -> cf.join().stream())
    .collect(Collectors.partitioningBy(animal -> animal.getSpecie() == cat));
List<Animal> catList = result.get(true), dogList = result.get(false);
queryExecutor.shutdown();

Parallel database calls using Java 8 streams and CompletableFuture

Question

1 answers

solution1
2 ACCPTED 2016-06-23 12:45:36

Parallel database calls using Java 8 streams and CompletableFuture

Question

1 answers

solution1 2 ACCPTED 2016-06-23 12:45:36

solution1
2 ACCPTED 2016-06-23 12:45:36