并行无限 Java 流耗尽内存

Question

I'm trying to understand why the following Java program gives an OutOfMemoryError , while the corresponding program without .parallel() doesn't.我试图理解为什么下面的 Java 程序给出了OutOfMemoryError ，而没有.parallel()的相应程序没有。

System.out.println(Stream
    .iterate(1, i -> i+1)
    .parallel()
    .flatMap(n -> Stream.iterate(n, i -> i+n))
    .mapToInt(Integer::intValue)
    .limit(100_000_000)
    .sum()
);

I have two questions:我有两个问题：

What is the intended output of this program?这个程序的预期输出是什么？
Without .parallel() it seems that this simply outputs sum(1+2+3+...) which means that it simply "gets stuck" at the first stream in the flatMap, which makes sense.如果没有.parallel()这似乎只是输出sum(1+2+3+...)这意味着它只是“卡在” flatMap 的第一个流，这是有道理的。
With parallel I don't know if there is an expected behaviour, but my guess would be that it somehow interleaved the first n or so streams, where n is the number of parallel workers.使用并行我不知道是否有预期的行为，但我的猜测是它以某种方式交错了前n左右的流，其中n是并行工作人员的数量。 It could also be slightly different based on the chunking/buffering behaviour.根据分块/缓冲行为，它也可能略有不同。
What causes it to run out of memory?是什么导致它耗尽内存？ I'm specifically trying to understand how these streams are implemented under the hood.我特别想了解这些流是如何在幕后实现的。
I'm guessing something blocks the stream, so it never finishes and is able to get rid of the generated values, but I don't quite know in which order things are evaluated and where buffering occurs.我猜有什么东西阻塞了流，所以它永远不会完成并且能够摆脱生成的值，但我不太清楚事物的评估顺序以及缓冲发生的位置。

Edit: In case it is relevant, I'm using Java 11.编辑：如果相关，我使用的是 Java 11。

Editt 2: Apparently the same thing happens even for the simple program IntStream.iterate(1,i->i+1).limit(1000_000_000).parallel().sum() , so it might have to do with the lazyness of limit rather than flatMap .编辑 2：即使对于简单的程序IntStream.iterate(1,i->i+1).limit(1000_000_000).parallel().sum() ，显然也会发生同样的事情，所以它可能与懒惰有关limit而不是flatMap 。

Answer 1

You say “ but I don't quite know in which order things are evaluated and where buffering occurs ”, which is precisely what parallel streams are about.你说“但我不太清楚事物的评估顺序和缓冲发生的位置”，这正是并行流的意义所在。 The order of evaluation is unspecified.评估顺序未指定。

A critical aspect of your example is the .limit(100_000_000) .您示例的一个关键方面是.limit(100_000_000) 。 This implies that the implementation can't just sum up arbitrary values, but must sum up the first 100,000,000 numbers.这意味着该实现不能仅对任意值求和，而必须对前 100,000,000 个数字求和。 Note that in the reference implementation, .unordered().limit(100_000_000) doesn't change the outcome, which indicates that there's no special implementation for the unordered case, but that's an implementation detail.请注意，在参考实现中， .unordered().limit(100_000_000)不会改变结果，这表明无序情况没有特殊实现，但这是一个实现细节。

Now, when worker threads process the elements, they can't just sum them up, as they have to know which elements they are allowed to consume, which depends on how many elements are preceding their specific workload.现在，当工作线程处理元素时，它们不能只是总结它们，因为它们必须知道允许使用哪些元素，这取决于在它们的特定工作负载之前有多少元素。 Since this stream doesn't know the sizes, this can only be known when the prefix elements have been processed, which never happens for infinite streams.由于此流不知道大小，因此只有在处理了前缀元素时才能知道这点，而对于无限流则不会发生这种情况。 So the worker threads keep buffering for the moment, this information becomes available.所以工作线程暂时保持缓冲，这些信息变得可用。

In principle, when a worker thread knows that it processes the leftmost¹ work-chunk, it could sum up the elements immediately, count them, and signal the end when reaching the limit.原则上，当工作线程知道它正在处理最左边的¹工作块时，它可以立即对元素求和，对它们进行计数，并在达到限制时发出结束信号。 So the Stream could terminate, but this depends on a lot of factors.因此 Stream 可能会终止，但这取决于很多因素。

In your case, a plausible scenario is that the other worker threads are faster in allocating buffers than the leftmost job is counting.在您的情况下，一个可能的情况是其他工作线程分配缓冲区的速度比最左边的作业计数的速度要快。 In this scenario, subtle changes to the timing could make the stream occasionally return with a value.在这种情况下，对时间的细微更改可能会使流偶尔返回一个值。

When we slow down all worker threads except the one processing the leftmost chunk, we can make the stream terminate (at least in most runs):当我们减慢除处理最左边块的工作线程之外的所有工作线程时，我们可以使流终止（至少在大多数运行中）：

System.out.println(IntStream
    .iterate(1, i -> i+1)
    .parallel()
    .peek(i -> { if(i != 1) LockSupport.parkNanos(1_000_000_000); })
    .flatMap(n -> IntStream.iterate(n, i -> i+n))
    .limit(100_000_000)
    .sum()
);

¹ I'm following a suggestion by Stuart Marks to use left-to-right order when talking about the encounter order rather than the processing order. ¹ 我遵循Stuart Marks 的建议，在谈论遭遇顺序而不是处理顺序时使用从左到右的顺序。

Answer 2

My best guess is that adding parallel() changes the internal behavior of flatMap() which already had problems being evaluated lazily before .我最好的猜测是，添加parallel()改变flatMap()的内部行为，而flatMap()之前已经有问题被懒惰地评估过。

The OutOfMemoryError error that you are getting was reported in [JDK-8202307] Getting a java.lang.OutOfMemoryError: Java heap space when calling Stream.iterator().next() on a stream which uses an infinite/very big Stream in flatMap . [JDK-8202307] 获取 java.lang.OutOfMemoryError: Java heap space when call Stream.iterator().next() 在 flatMap 中使用无限/非常大的流的流中报告了您得到的OutOfMemoryError错误. If you look at the ticket it's more or less the same stack trace that you are getting.如果您查看票证，它或多或少与您获得的堆栈跟踪相同。 The ticket was closed as Won't Fix with following reason:由于以下原因，该票证因无法修复而关闭：

The iterator() and spliterator() methods are "escape hatches" to be used when it's not possible to use other operations. iterator()和spliterator()方法是在无法使用其他操作时使用的“逃生舱”。 They have some limitations because they turn what is a push model of the stream implementation into a pull model.它们有一些限制，因为它们将流实现的推模型转换为拉模型。 Such a transition requires buffering in certain cases, such as when an element is (flat) mapped to two or more elements .在某些情况下，这种转换需要缓冲，例如当一个元素（平面）映射到两个或多个元素时。 It would significantly complicate the stream implementation, likely at the expense of common cases, to support a notion of back-pressure to communicate how many elements to pull through nested layers of element production.支持背压的概念来传达要通过元素生产的嵌套层拉动多少元素，这会使流实现显着复杂化，可能会以牺牲常见情况为代价。

Answer 3

OOME is caused not by the stream being infinite, but by the fact that it isn't . OOME不是因为流是无限的，而是因为它不是.

Ie, if you comment out the .limit(...) , it will never run out of memory -- but of course, it will never end either.即，如果您注释掉.limit(...) ，它永远不会耗尽内存——当然，它也永远不会结束。

Once it's split, the stream can only keep track of the number of elements if they're accumulated within each thread (looks like the actual accumulator is Spliterators$ArraySpliterator#array ).一旦它被拆分，流只能跟踪每个线程中累积的元素数量（看起来实际的累加器是Spliterators$ArraySpliterator#array ）。

Looks like you can reproduce it without flatMap , just run the following with -Xmx128m :看起来您可以在没有flatMap情况下重现它，只需使用-Xmx128m运行以下-Xmx128m ：

    System.out.println(Stream
            .iterate(1, i -> i + 1)
            .parallel()
      //    .flatMap(n -> Stream.iterate(n, i -> i+n))
            .mapToInt(Integer::intValue)
            .limit(100_000_000)
            .sum()
    );

However, after commenting out the limit() , it should run fine until you decide to spare your laptop.但是，在注释掉limit() ，它应该可以正常运行，直到您决定不使用笔记本电脑。

Besides the actual implementation details, here's what I think is happening:除了实际的实现细节，这是我认为正在发生的事情：

With limit , the sum reducer wants the first X elements to sum up, so no thread can emit partial sums.使用limit ， sum reducer 希望前 X 个元素相加，因此没有线程可以发出部分总和。 Each "slice" (thread) will need to accumulate elements and pass them through.每个“切片”（线程）都需要累积元素并通过它们。 Without limit, there's no such constraint so each "slice" will just compute the partial sum out of the elements it gets (forever), assuming it will emit the result eventually.没有限制，没有这样的约束，所以每个“切片”只会计算它得到的元素的部分总和（永远），假设它最终会发出结果。

并行无限 Java 流耗尽内存

问题描述

3 个解决方案

解决方案1
11 已采纳 2020-01-31 12:56:44

解决方案2
5 2020-01-31 11:13:18

解决方案3
3 2020-01-31 11:14:43

并行无限 Java 流耗尽内存

问题描述

3 个解决方案

解决方案1 11 已采纳 2020-01-31 12:56:44

解决方案2 5 2020-01-31 11:13:18

解决方案3 3 2020-01-31 11:14:43

解决方案1
11 已采纳 2020-01-31 12:56:44

解决方案2
5 2020-01-31 11:13:18

解决方案3
3 2020-01-31 11:14:43