为什么 CompletableFuture 在单独的流中加入/获取比使用一个 stream 更快

Question

对于以下程序，我试图弄清楚为什么使用 2 个不同的流并行化任务并使用相同的 stream 并在 Completable 未来调用 join/get 会使它们花费更长的时间，就像它们被顺序处理一样）。

public class HelloConcurrency {

    private static Integer sleepTask(int number) {
        System.out.println(String.format("Task with sleep time %d", number));
        try {
            TimeUnit.SECONDS.sleep(number);
        } catch (InterruptedException e) {
            e.printStackTrace();
            return -1;
        }
        return number;
    }

    public static void main(String[] args) {
        List<Integer> sleepTimes = Arrays.asList(1,2,3,4,5,6);
        System.out.println("WITH SEPARATE STREAMS FOR FUTURE AND JOIN");
        ExecutorService executorService = Executors.newFixedThreadPool(6);
        long start = System.currentTimeMillis();
        List<CompletableFuture<Integer>> futures = sleepTimes.stream()
                .map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
                        .exceptionally(ex -> { ex.printStackTrace(); return -1; }))
                .collect(Collectors.toList());
        executorService.shutdown();
        List<Integer> result = futures.stream()
                .map(CompletableFuture::join)
                .collect(Collectors.toList());
        long finish = System.currentTimeMillis();
        long timeElapsed = (finish - start)/1000;
        System.out.println(String.format("done in %d seconds.", timeElapsed));
        System.out.println(result);

        System.out.println("WITH SAME STREAM FOR FUTURE AND JOIN");
        ExecutorService executorService2 = Executors.newFixedThreadPool(6);
        start = System.currentTimeMillis();
        List<Integer> results = sleepTimes.stream()
                .map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
                        .exceptionally(ex -> { ex.printStackTrace(); return -1; }))
                .map(CompletableFuture::join)
                .collect(Collectors.toList());
        executorService2.shutdown();
        finish = System.currentTimeMillis();
        timeElapsed = (finish - start)/1000;
        System.out.println(String.format("done in %d seconds.", timeElapsed));
        System.out.println(results);
    }
}

Output

WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 6
Task with sleep time 5
Task with sleep time 1
Task with sleep time 3
Task with sleep time 2
Task with sleep time 4
done in 6 seconds.
[1, 2, 3, 4, 5, 6]
WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
Task with sleep time 2
Task with sleep time 3
Task with sleep time 4
Task with sleep time 5
Task with sleep time 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]

Answer 1

这两种方法有很大的不同，让我试着解释清楚

第一种方法：在第一种方法中，您为所有 6 个任务启动所有Async请求，然后在每个任务上调用join function 以获得结果

第二种方法：但在第二种方法中，您在为每个任务旋转Async请求后立即调用join 。 例如，在为任务1调用join旋转Async线程后，确保该线程完成任务，然后仅使用Async线程启动第二个任务

注意：另一方面，如果您清楚地观察 output，在第一种方法中，output 以随机顺序出现，因为所有六个任务都是异步执行的。 但是在第二种方法中，所有任务一个接一个地依次执行。

我相信您对 stream map操作的执行方式有所了解，或者您可以从此处或此处获取更多信息

为了执行计算，stream 操作被组合成 stream 管道。 A stream pipeline consists of a source (which might be an array, a collection, a generator function, an I/O channel, etc), zero or more intermediate operations (which transform a stream into another stream, such as filter(Predicate) ) 和终端操作（产生结果或副作用，例如 count() 或 forEach(Consumer)）。 流是懒惰的； 仅在发起终端操作时才对源数据进行计算，并且仅在需要时消耗源元素。

Answer 2

stream 框架没有定义在 stream 元素上执行map操作的顺序，因为它不适用于可能是相关问题的用例。 因此，您的第二个版本执行的特定方式本质上等同于

List<Integer> results = new ArrayList<>();
for (Integer sleepTime : sleepTimes) {
  results.add(CompletableFuture
     .supplyAsync(() -> sleepTask(sleepTime), executorService2)
     .exceptionally(ex -> { ex.printStackTrace(); return -1; }))
     .join());
}

...它本身本质上等同于

List<Integer> results = new ArrayList<>()
for (Integer sleepTime : sleepTimes) {
  results.add(sleepTask(sleepTime));
}

Answer 3

@Deadpool 回答得很好，只是添加我的答案可以帮助人们更好地理解它。

通过向这两种方法添加更多打印，我能够得到答案。

TLDR

2 stream 方法：我们异步启动所有 6 个任务，然后在每个任务上调用 join function 以获得单独的 ZF7B44CFAFD5C52223D5498196C8A2EB7 中的结果
1 stream 方法：我们在启动每个任务后立即调用连接。 例如，在为任务 1 旋转线程后，调用 join 确保线程等待任务 1 完成，然后仅使用异步线程启动第二个任务。

注意：另外，如果我们清楚地观察 output，在 1 stream 方法中，output 出现顺序，因为所有六个任务都是按顺序执行的。 但是在第二种方法中，所有任务都是并行执行的，因此顺序是随机的。

注意 2 ：如果我们在 1 stream 方法中将stream()替换为parallelStream() ，它将与 2 stream 方法相同。

更多证据

我向提供以下输出的流添加了更多打印并确认了上面的注释：

1 stream：

List<Integer> results = sleepTimes.stream()
                .map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService2)
                        .exceptionally(ex -> { ex.printStackTrace(); return -1; }))
                .map(f  -> {
                    int num = f.join();
                    System.out.println(String.format("doing join on task %d", num));
                    return num;
                })
                .collect(Collectors.toList());



WITH SAME STREAM FOR FUTURE AND JOIN
Task with sleep time 1
doing join on task 1
Task with sleep time 2
doing join on task 2
Task with sleep time 3
doing join on task 3
Task with sleep time 4
doing join on task 4
Task with sleep time 5
doing join on task 5
Task with sleep time 6
doing join on task 6
done in 21 seconds.
[1, 2, 3, 4, 5, 6]

2个流：

List<CompletableFuture<Integer>> futures = sleepTimes.stream()
          .map(sleepTime -> CompletableFuture.supplyAsync(() -> sleepTask(sleepTime), executorService)
                  .exceptionally(ex -> { ex.printStackTrace(); return -1; }))
          .collect(Collectors.toList());

List<Integer> result = futures.stream()
            .map(f  -> {
                int num = f.join();
                System.out.println(String.format("doing join on task %d", num));
                return num;
            })
            .collect(Collectors.toList());



WITH SEPARATE STREAMS FOR FUTURE AND JOIN
Task with sleep time 2
Task with sleep time 5
Task with sleep time 3
Task with sleep time 1
Task with sleep time 4
Task with sleep time 6
doing join on task 1
doing join on task 2
doing join on task 3
doing join on task 4
doing join on task 5
doing join on task 6
done in 6 seconds.
[1, 2, 3, 4, 5, 6]

为什么 CompletableFuture 在单独的流中加入/获取比使用一个 stream 更快

问题描述

3 个解决方案

解决方案1
8 已采纳 2019-11-04 20:46:41

解决方案2
2 2019-11-04 20:56:14

解决方案3
2 2019-11-04 21:24:39

为什么 CompletableFuture 在单独的流中加入/获取比使用一个 stream 更快

问题描述

3 个解决方案

解决方案1 8 已采纳 2019-11-04 20:46:41

解决方案2 2 2019-11-04 20:56:14

解决方案3 2 2019-11-04 21:24:39

解决方案1
8 已采纳 2019-11-04 20:46:41

解决方案2
2 2019-11-04 20:56:14

解决方案3
2 2019-11-04 21:24:39