简体   繁体   English

parallelStream 与 stream.parallel

[英]parallelStream vs stream.parallel

I have been curious about the difference between Collections.parallelStream() and Collections.stream().parallel() .我一直很好奇Collections.parallelStream()Collections.stream().parallel()之间的区别。 According to the Javadocs, parallelStream() tries to return a parallel stream, whereas stream().parallel() returns a parallel stream.根据 Javadocs, parallelStream()尝试返回并行流,而stream().parallel()返回并行流。 Through some testing of my own, I have found no differences.通过我自己的一些测试,我没有发现任何差异。 Where does the difference in these two methods lie?这两种方法的区别在哪里? Is one implementation more time efficient than another?一种实现是否比另一种实现更省时? Thanks.谢谢。

Even if they act the same at the moment , there is a difference - at least in their documentation, as you correctly pointed out;即使他们此刻的行为相同,也存在差异 - 至少在他们的文档中,正如您正确指出的那样; that might be exploited in the future as far as I can tell.据我所知,这可能在未来被利用。

At the moment the parallelStream method is defined in the Collection interface as:目前, parallelStream方法在Collection接口中定义为:

default Stream<E> parallelStream() {
    return StreamSupport.stream(spliterator(), true);
}

Being a default method it could be overridden in implementations (and that's what Collections inner classes actually do).作为默认方法,它可以在实现中被覆盖(这就是Collections内部类实际所做的)。

That hints that even if the default method returns a parallel Stream, there could be Collections that override this method to return a non-parallel Stream .这暗示即使默认方法返回并行 Stream ,也可能存在覆盖此方法以返回non-parallel Stream集合。 That is the reason the documentation is probably the way it is.这就是文档可能是这样的原因。

At the same time even if parallelStream returns a sequential stream - it is still a Stream , and then you could easily call parallel on it:同时,即使parallelStream返回一个顺序流 - 它仍然是一个Stream ,然后您可以轻松地对其调用parallel

  Collections.some()
       .parallelStream() // actually sequential
       .parallel() // force it to be parallel

At least for me, this looks weird.至少对我来说,这看起来很奇怪。

It seems that the documentation should somehow state that after calling parallelStream there should be no reason to call parallel again to force that - since it might be useless or even bad for the processing.似乎文档应该以某种方式声明,在调用parallelStream ,应该没有理由再次调用parallel来强制这样做 - 因为它可能对处理无用甚至有害。

EDIT编辑

For anyone reading this - please read the comments by Holger also;对于阅读本文的任何人 - 请同时阅读 Holger 的评论; it covers cases beyond what I said in this answer.它涵盖了我在这个答案中所说的以外的情况。

There is no difference between Collections.parallelStream() and Collections.stream().parallel() . Collections.parallelStream()Collections.stream().parallel()之间没有区别。 They will both divide the stream to the extent that the underlying spliterator will allow, and they will both run using the default ForkJoinPool (unless already running inside another one).它们都将在底层拆分器允许的范围内划分流,并且它们都将使用默认的ForkJoinPool运行(除非已经在另一个内部运行)。

class Employee {
    String name;
    int salary;

    public int getSalary() {
        return salary;
    }

    public void setSalary(int salary) {
        this.salary = salary;
    }

    public Employee(String name, int salary) {
        this.name = name;
        this.salary = salary;
    }
}
class ParallelStream {

    public static void main(String[] args) {

        long t1, t2;
        List<Employee> eList = new ArrayList<>();
        for (int i = 0; i < 100; i++) {
            eList.add(new Employee("A", 20000));
            eList.add(new Employee("B", 3000));
            eList.add(new Employee("C", 15002));
            eList.add(new Employee("D", 7856));
            eList.add(new Employee("E", 200));
            eList.add(new Employee("F", 50000));
        }

        /***** Here We Are Creating A 'Sequential Stream' & Displaying The Result *****/
        t1 = System.currentTimeMillis();
        System.out.println("Sequential Stream Count?= " + eList.stream().filter(e -> e.getSalary() > 15000).count());

        t2 = System.currentTimeMillis();
        System.out.println("Sequential Stream Time Taken?= " + (t2 - t1) + "\n");

        /***** Here We Are Creating A 'Parallel Stream' & Displaying The Result *****/
        t1 = System.currentTimeMillis();
        System.out.println("Parallel Stream Count?= " + eList.parallelStream().filter(e -> e.getSalary() > 15000).count());

        t2 = System.currentTimeMillis();
        System.out.println("Parallel Stream Time Taken?= " + (t2 - t1));

        /***** Here We Are Creating A 'Parallel Stream with Collection.stream.parallel' & Displaying The Result *****/
        t1 = System.currentTimeMillis();
        System.out.println("stream().parallel() Count?= " + eList.stream().parallel().filter(e -> e.getSalary() > 15000).count());

        t2 = System.currentTimeMillis();
        System.out.println("stream().parallel() Time Taken?= " + (t2 - t1));



    }

}

I had tried with all three ways .stream(),.parallelStream() and .stream().parallel().我已经尝试了所有三种方式.stream(),.parallelStream() and .stream().parallel(). with same number of records and able to identify timing taken by all three approach.具有相同数量的记录并且能够识别所有三种方法所采用的时间。

Here i had mentioned O/P of same.在这里,我提到了相同的 O/P。

Sequential Stream Count?= 300
Sequential Stream Time Taken?= 18
Parallel Stream Count?= 300
Parallel Stream Time Taken?= 6
stream().parallel() Count?= 300
stream().parallel() Time Taken?= 1

I am not sure,but as mentioned in O/P time taken by stream().parallel() is 1/6th of parallelStream() .我不知道,但在采取O / P时提及stream().parallel()是1/6 parallelStream()

Still any experts suggestions are mostly welcome.仍然欢迎任何专家的建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM