I am writing a program to download historical quotes from a source. The source provides files over http for each day which need to be parsed and processed. The program downloads multiple files in parallel using a CompletableFuture
using different stages. The first stage is to make a Http call using HttpClient
and get the response.
The getHttpResponse()
method returns a CloseableHttpResponse
Object. I also want to return a url for which this http request was made. Simplest way is to have a wrapper object having these 2 fields, but i feel it is too much to have a class just to contain these 2 fields. Is there a way with CompletableFuture
or Streams that I can achieve this?
filesToDownload.stream()
.map(url -> CompletableFuture.supplyAsync(() -> this.getHttpResponse(url), this.executor) )
.map(httpResponseFuture -> httpResponseFuture.thenAccept(t -> processHttpResponse(t)))
.count();
It's not clear why you want to bring in the Stream API at all costs. Splitting the CompletableFuture
use into two map
operations causes the problem which wouldn't exist otherwise. Besides that, using map
for side effects is an abuse of the Stream API. This may break completely in Java 9, if filesToDownload
is a Stream source with a known size (like almost every Collection). Then, count()
will simply return that known size, without processing the functions of the map
operations…
If you want to pass the URL
and the CloseableHttpResponse
to processHttpResponse
, you can do it as easy as:
filesToDownload.forEach(url ->
CompletableFuture.supplyAsync(() -> this.getHttpResponse(url), this.executor)
.thenAccept( t -> processHttpResponse(t, url))
);
Even, if you use the Stream API to collect results, there is no reason to split the CompletableFuture
into multiple map
operations:
List<…> result = filesToDownload.stream()
.map(url -> CompletableFuture.supplyAsync(() -> this.getHttpResponse(url), this.executor)
.thenApply( t -> processHttpResponse(t, url)) )
.collect(Collectors.toList())
.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
Note that this will collect the CompletableFuture
s into a List
before waiting for any result in a second Stream operation. This is preferable to using a parallel Stream operation as it ensures that all asynchronous operations have been submitted, before starting to wait.
Using a single Stream pipeline would imply waiting for the completion of the first job before even submitting the second and using a parallel Stream would only reduce that problem instead of solving it. It would depend on the execution strategy of the Stream implementation (the default Fork/Join pool), which interferes with actual policy of your specified executor. Eg, if the specified executor is supposed to use more threads than CPU cores, the Stream would still submit only as much jobs at a time as there are cores — or even less if there are other jobs on the default Fork/Join pool.
In contrast, the behavior of the solution above will be entirely controlled by the execution strategy of the specified executor.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.