I would like to replicate and parallelize the following behavior with Java 8 streams:
for (animal : animalList) {
// find all other animals with the same breed
Collection<Animal> queryResult = queryDatabase(animal.getBreed());
if (animal.getSpecie() == cat) {
catList.addAll(queryResult);
} else {
dogList.addAll(queryResult);
}
}
This is what I have so far
final Executor queryExecutor =
Executors.newFixedThreadPool(Math.min(animalList.size(), 10),
new ThreadFactory(){
public Thread newThread(Runnable r){
Thread t = new Thread(r);
t.setDaemon(true);
return t;
}
});
List<CompletableFuture<Collection<Animal>>> listFutureResult = animalList.stream()
.map(animal -> CompletableFuture.supplyAsync(
() -> queryDatabase(animal.getBreed()), queryExecutor))
.collect(Collectors.toList());
List<Animal> = listFutureResult.stream()
.map(CompletableFuture::join)
.flatMap(subList -> subList.stream())
.collect(Collectors.toList());
1 - I'm not sure how to split the stream so that I can get 2 different animal lists, one for cats and one for dogs.
2 - does this solution look reasonable?
First, consider just using
List<Animal> result = animalList.parallelStream()
.flatMap(animal -> queryDatabase(animal.getBreed()).stream())
.collect(Collectors.toList());
even if it won't give you the desired concurrency of up to ten. The simplicity might compensate it. Regarding the other part, it's as easy as
Map<Boolean,List<Animal>> result = animalList.parallelStream()
.flatMap(animal -> queryDatabase(animal.getBreed()).stream())
.collect(Collectors.partitioningBy(animal -> animal.getSpecie() == cat));
List<Animal> catList = result.get(true), dogList = result.get(false);
In case you have more species than just cats and dogs, you may use Collectors.groupingBy(Animal::getSpecie)
to get a map from species to list of animals.
If you insist on using your own thread pool, a few things can be improved:
Executor queryExecutor = Executors.newFixedThreadPool(Math.min(animalList.size(), 10),
r -> {
Thread t = new Thread(r);
t.setDaemon(true);
return t;
});
List<Animal> result = animalList.stream()
.map(animal -> CompletableFuture.completedFuture(animal.getBreed())
.thenApplyAsync(breed -> queryDatabase(breed), queryExecutor))
.collect(Collectors.toList()).stream()
.flatMap(cf -> cf.join().stream())
.collect(Collectors.toList());
Your supplyAsync
variant required capturing the actual Animal
instance, creating a new Supplier
for each animal. In contrast, the function passed to thenApplyAsync
is invariant, performing the same operation for each parameter value. The code above assumes that getBreed
is a cheap operation, otherwise, it wouldn't be hard to pass the Animal
instance to completedFuture
and perform getBreed()
with the async function instead.
The .map(CompletableFuture::join)
can be replaced by a simple chained .join()
within the flatMap
function. Otherwise, if you prefer method references, you should use them consistently, ie .map(CompletableFuture::join).flatMap(Collection::stream)
.
Of course, this variant also allows using partitioningBy
instead of toList
.
As a final note, if you invoke shutdown
on the executor service after use, there is no need to mark the threads as daemon:
ExecutorService queryExecutor=Executors.newFixedThreadPool(Math.min(animalList.size(),10));
Map<Boolean,List<Animal>> result = animalList.stream()
.map(animal -> CompletableFuture.completedFuture(animal.getBreed())
.thenApplyAsync(breed -> queryDatabase(breed), queryExecutor))
.collect(Collectors.toList()).stream()
.flatMap(cf -> cf.join().stream())
.collect(Collectors.partitioningBy(animal -> animal.getSpecie() == cat));
List<Animal> catList = result.get(true), dogList = result.get(false);
queryExecutor.shutdown();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.