简体   繁体   English

来自HashSet的并行流不会并行运行

[英]Parallel stream from a HashSet doesn't run in parallel

I have collection of elements that I want to process in parallel. 我有要并行处理的元素集合。 When I use a List , parallelism works. 当我使用List ,并行性有效。 However, when I use a Set , it does not run in parallel. 但是,当我使用Set ,它不会并行运行。

I wrote a code sample that shows the problem: 我写了一个显示问题的代码示例:

public static void main(String[] args) {
    ParallelTest test = new ParallelTest();

    List<Integer> list = Arrays.asList(1,2);
    Set<Integer> set = new HashSet<>(list);

    ForkJoinPool forkJoinPool = new ForkJoinPool(4);

    System.out.println("set print");
    try {
        forkJoinPool.submit(() ->
            set.parallelStream().forEach(test::print)
        ).get();
    } catch (Exception e) {
        return;
    }

    System.out.println("\n\nlist print");
    try {
        forkJoinPool.submit(() ->
            list.parallelStream().forEach(test::print)
        ).get();
    } catch (Exception e) {
        return;
    }   
}

private void print(int i){
    System.out.println("start: " + i);
    try {
        TimeUnit.SECONDS.sleep(1);
    } catch (InterruptedException e) {
    }
    System.out.println("end: " + i);
}

This is the output that I get on windows 7 这是我在Windows 7上获得的输出

set print
start: 1
end: 1
start: 2
end: 2

list print
start: 2
start: 1
end: 1
end: 2

We can see that the first element from the Set had to finish before the second element is processed. 我们可以看到Set中的第一个元素必须在处理第二个元素之前完成。 For the List , the second element starts before the first element finishes. 对于List ,第二个元素在第一个元素完成之前开始。

Can you tell me what causes this issue, and how to avoid it using a Set collection? 你能告诉我导致这个问题的原因,以及如何使用Set集合来避免它?

I can reproduce the behavior you see, where the parallelism doesn't match the parallelism of the fork-join pool parallelism you've specified. 我可以重现您看到的行为,其中并行性与您指定的fork-join池并行性的并行性不匹配。 After setting the fork-join pool parallelism to 10, and increasing the number of elements in the collection to 50, I see the parallelism of the list-based stream rising only to 6, whereas the parallelism of the set-based stream never gets above 2. 在将fork-join pool parallelism设置为10并将集合中的元素数量增加到50之后,我看到基于列表的流的并行性仅上升到6,而基于集合的流的并行性从未超过2。

Note, however, that this technique of submitting a task to a fork-join pool to run the parallel stream in that pool is an implementation "trick" and is not guaranteed to work. 但请注意,将任务提交到fork-join池以在该池中运行并行流的这种技术是一种实现“技巧”,并不能保证能够正常工作。 Indeed, the threads or thread pool that is used for execution of parallel streams is unspecified . 实际上, 未指定用于执行并行流的线程或线程池。 By default, the common fork-join pool is used, but in different environments, different thread pools might end up being used. 默认情况下,使用公共fork-join池,但在不同的环境中,最终可能会使用不同的线程池。 (Consider a container within an application server.) (考虑应用程序服务器中的容器。)

In the java.util.stream.AbstractTask class, the LEAF_TARGET field determines the amount of splitting that is done, which in turn determines the amount of parallelism that can be achieved. java.util.stream.AbstractTask类中, LEAF_TARGET字段确定完成的拆分量,这又决定了可以实现的并行度。 The value of this field is based on ForkJoinPool.getCommonPoolParallelism() which of course uses the parallelism of the common pool, not whatever pool happens to be running the tasks. 此字段的值基于ForkJoinPool.getCommonPoolParallelism() ,它当然使用公共池的并行性,而不是任何池正在运行任务。

Arguably this is a bug (see OpenJDK issue JDK-8190974 ), however, this entire area is unspecified anyway. 可以说这是一个错误(参见OpenJDK问题JDK-8190974 ),但是,无论如何,整个区域都未指定。 However, this area of the system definitely needs development, for example in terms of splitting policy, the amount of parallelism available, dealing with blocking tasks, among other issues. 但是,系统的这个区域肯定需要开发,例如在拆分策略,可用并行数量,处理阻塞任务以及其他问题方面。 A future release of the JDK may address some of these issues. JDK的未来版本可能会解决其中一些问题。

Meanwhile, it is possible to control the parallelism of the common fork-join pool through the use of system properties. 同时,可以通过使用系统属性来控制公共fork-join池的并行性。 If you add this line to your program, 如果您将此行添加到您的程序,

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "10");

and you run the streams in the common pool (or if you submit them to your own pool that has a sufficiently high level of parallelism set) you will observe that many more tasks are run in parallel. 并且您在公共池中运行流(或者如果您将它们提交到具有足够高并行度设置的池中),您将观察到更多任务并行运行。

You can also set this property on the command line using the -D option. 您还可以使用-D选项在命令行上设置此属性。

Again, this is not guaranteed behavior, and it may change in the future. 同样,这不是保证行为,并且可能在将来发生变化。 But this technique will probably work for JDK 8 implementations for the forseeable future. 但是这种技术可能适用于可预见的未来的JDK 8实现。

UPDATE 2019-06-12: The bug JDK-8190974 was fixed in JDK 10, and the fix has been backported to an upcoming JDK 8u release (8u222). 更新2019-06-12:错误JDK-8190974已在JDK 10中修复,修复程序已被移植到即将发布的JDK 8u版本(8u222)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM