简体   繁体   English

在并行Java流中处理随机数

[英]Processing random numbers in parallel Java stream

I want to generate 5 distinct random numbers from range 0 - 50 and then execute some operation on them in parallel. 我想从0到50范围内生成5个不同的随机数,然后并行地对它们执行一些操作。 When I wrote this the program never ended: 当我写这个程序时,程序永远不会结束:

new Random().ints(0, 50)
            .distinct()
            .limit(5)
            .parallel()
            .forEach(d -> System.out.println("s: " + d));

I've tried to debug it using peek. 我试图使用peek调试它。 I've got infinit number of c: lines, 50 d: lines, but zero l: or s: lines: 我有无限数量的c:行,50 d:行,但是零l:s:行:

new Random().ints(0, 50)
            .peek(d -> System.out.println("c: " + d))
            .distinct()
            .peek(d -> System.out.println("d: " + d))
            .limit(5)
            .peek(d -> System.out.println("l: " + d))
            .parallel()
            .forEach(d -> System.out.println("s: " + d));

What is wrong with my implementation? 我的实施有什么问题?

First, please note that .parallel() changes the parallel status of the whole pipeline, so it affects all the operations, not only subsequent ones. 首先,请注意.parallel()更改整个管道的并行状态,因此它会影响所有操作,而不仅仅是后续操作。 In your case 在你的情况下

new Random().ints(0, 50)
            .distinct()
            .limit(5)
            .parallel()
            .forEach(d -> System.out.println("s: " + d));

Is the same as 是相同的

new Random().ints(0, 50)
            .parallel()
            .distinct()
            .limit(5)
            .forEach(d -> System.out.println("s: " + d));

You cannot parallelize only part of the pipeline. 您不能仅并行化部分管道。 It's either parallel or not. 它是平行还是不平行。

Now back to your question. 现在回到你的问题。 As Random.ints is an unordered stream, unordered implementations of distinct and limit are selected, so it's not a duplicate of this question (where problem was in ordered distinct implementation). 由于Random.ints是一个无序流,因此选择了distinctlimit无序实现,因此它不是这个问题的重复(问题出现在有序的不同实现中)。 Here the problem is in the unordered limit() implementation. 这里的问题在于无序的limit()实现。 To reduce the possible contention it does not check the total count of elements found in different threads until every subtask gets at least 128 elements or the upstream is exhausted (see the implementation , 1 << 7 = 128 ). 为了减少可能的争用,它不检查在不同线程中找到的元素的总数,直到每个子任务获得至少128个元素或上游耗尽(参见实现1 << 7 = 128 )。 In your case upstream distinct() found only 50 different elements and desperately traverses the input in the hope to find more, but downstream limit() don't signal to stop the processing, because it wants to collect at least 128 elements before checking whether the limit is reached (which is not very smart as the limit is less than 128). 在你的情况下,上游distinct()发现只有50个不同的元素,拼命遍历输入,希望找到更多,但下游limit()不发信号停止处理,因为它想要在检查之前是否收集至少128个元素达到限制(由于限制小于128,因此不是很聪明)。 So to make this thing working you should select at least (128*number of CPUs) different elements. 所以要使这个东西工作,你应该至少选择(128 *个CPU数量)不同的元素。 On my 4-core machine using new Random().ints(0, 512) succeeds while new Random().ints(0, 511) stuck. 在我的4核机器上使用new Random().ints(0, 512)成功,而new Random().ints(0, 511)卡住了。

To fix this I'd suggest to collect random numbers sequentially and create a new stream there: 为了解决这个问题,我建议按顺序收集随机数并在那里创建一个新流:

int[] ints = new Random().ints(0, 50).distinct().limit(5).toArray();
Arrays.stream(ints).parallel()
      .forEach(d -> System.out.println("s: " + d));

I assume that you want to perform some expensive downstream processing. 我假设你想要执行一些昂贵的下游处理。 In this case parallelizing the generation of 5 random numbers is not very useful. 在这种情况下,并行生成5个随机数并不是很有用。 This part will be faster when performed sequentially. 顺序执行时,此部分将更快。

Update: filed a bug report and submitted a patch . 更新:提交了错误报告并提交了补丁

Your call to ints(0, 50) 你的ints(0, 50)电话( ints(0, 50)

Returns an effectively unlimited stream of pseudorandom int values, each conforming to the given origin (inclusive) and bound (exclusive). 返回有效无限的伪随机int值流,每个值符合给定的原点(包括)和绑定(不包括)。

I originally thought that it was the unterminated IntStream that was the problem, but I duplicated the problem. 我原本以为是未IntStream是问题所在,但我重复了这个问题。

new Random().ints(0, 50)
            .distinct().limit(5)
            .parallel().forEach(a -> System.out.println(a));

Goes to an infinite loop, while 进入一个无限循环,而

new Random().ints(0, 50)
            .distinct().limit(5)
            .forEach(a -> System.out.println(a));

Finishes correctly. 完成正确。

My Stream knowledge is not so good that I could explain it, but clearly the parallelization doesn't play nicely (possibly due to the infinite stream). 我的Stream知识不是很好,我可以解释它,但显然并行化不能很好地发挥作用(可能是由于无限的流)。

The closest option to what you're trying to do is perhaps to use iterate and unordered : 您尝试做的最接近的选择可能是使用iterateunordered

Random ran = new Random();
IntStream.iterate(ran.nextInt(50), i -> ran.nextInt(50))
    .unordered()
    .distinct()
    .limit(5)
    .parallel()
    .forEach(System.out::println);

Using an infinite stream together with distinct and parallel can be expensive or result in no responses. 使用无限流和distinct parallel可能是昂贵的或导致没有响应。 See the API Note or this question for more information. 有关详细信息,请参阅API注释此问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM