生成无限并行流

Question

Problem 问题

Hi, I have a function where i going to return infinite stream of parallel (yes, it is much faster in that case) generated results. 嗨，我有一个函数，我将返回无限的并行流（是的，在这种情况下它要快得多）生成的结果。 So obviously (or not) i used 所以很明显（或者不是）我用过

Stream<Something> stream = Stream.generate(this::myGenerator).parallel()

It works, however ... it doesn't when i want to limit the result (everything is fine when the stream is sequential). 它有效，但是...当我想要限制结果时，它就没有了（当流是顺序的时，一切都很好）。 I mean, it creates results when i make something like 我的意思是，当我做类似的事情时，它会产生结果

stream.peek(System.out::println).limit(2).collect(Collectors.toList())

but even when peek output produces more than 10 elements, collect is still not finallized (generating is slow so those 10 can took even a minute)... and that is easy example. 但是即使peek输出产生了10个以上的元素， collect仍未最终确定（生成速度很慢，因此这10个元素甚至可能花费一分钟）……这就是简单的例子。 Actually, limiting those results is a future due the main expectation is to get only better than recent results until the user will kill the process (other case is to return first what i can make with throwing exception if nothing else will help [ findFirst didn't, even when i had more elements on the console and no more results for about 30 sec]). 实际上，限制这些结果是一个未来，因为主要的期望是直到用户终止该过程后才能获得比最近的结果更好的结果（其他情况是首先返回我可以抛出异常的结果，如果没有其他帮助的话，[ findFirst没有] t，即使我在控制台上有更多元素并且大约30秒没有更多结果]）。

So, the question is... 所以，问题是...

how to copy with that? 如何复制？ My idea was also to use RxJava, and there is another question - how to achieve similar result with that tool (or other). 我的想法也是使用RxJava，还有另一个问题-如何使用该工具（或其他工具）获得相似的结果。

Code sample 代码样例

public Stream<Solution> generateSolutions() {
     final Solution initialSolution = initialSolutionMaker.findSolution();
     return Stream.concat(
          Stream.of(initialSolution),
          Stream.generate(continuousSolutionMaker::findSolution)
    ).parallel();
}

new Solver(instance).generateSolutions()
    .map(Solution::getPurpose)
    .peek(System.out::println)
    .limit(5).collect(Collectors.toList());

Implementation of findSolution is not important. findSolution实现并不重要。 It has some side effect like adding to solutions repo (singleton, sych etc..), but nothing more. 它具有一些副作用，例如添加到解决方案回购中（单例，sych等），但仅此而已。

Answer 1

As explained in the already linked answer , the key point to an efficient parallel stream is to use a stream source already having an intrinsic size instead of using an unsized or even infinite stream and apply a limit on it. 正如已经链接的答案中所解释的那样，高效并行流的关键点是使用已经具有固有大小的流源，而不是使用无大小的甚至无限的流并对其施加limit 。 Injecting a size doesn't work with the current implementation at all, while ensuring that a known size doesn't get lost is much easier. 注入一个大小根本不适合当前的实现，而确保已知大小不会丢失则容易得多。 Even if the exact size can't be retained, like when applying a filter , the size still will be carried as an estimate size. 即使无法保留确切的大小（例如应用filter ，该大小仍将作为估计大小。

So instead of 所以代替

Stream.generate(this::myGenerator).parallel()
      .peek(System.out::println)
      .limit(2)
      .collect(Collectors.toList())

just use 只是使用

IntStream.range(0, /* limit */ 2).unordered().parallel()
         .mapToObj(unused -> this.myGenerator())
         .peek(System.out::println)
         .collect(Collectors.toList())

Or, closer to your sample code 或者，更接近您的示例代码

public Stream<Solution> generateSolutions(int limit) {
    final Solution initialSolution = initialSolutionMaker.findSolution();
    return Stream.concat(
         Stream.of(initialSolution),
         IntStream.range(1, limit).unordered().parallel()
               .mapToObj(unused -> continuousSolutionMaker.findSolution())
   );
}

new Solver(instance).generateSolutions(5)
    .map(Solution::getPurpose)
    .peek(System.out::println)
    .collect(Collectors.toList());

Answer 2

Unfortunately this is expected behavior. 不幸的是，这是预期的行为。 As I remember I've seen at least two topics on this matter, here is one of them . 我记得我至少看到过两个关于此问题的话题，这是其中之一。

The idea is that Stream.generate creates an unordered infinite stream and limit will not introduce the SIZED flag. 这个想法是Stream.generate创建一个unordered infinite stream并且limit不会引入SIZED标志。 Because of this when you spawn a parallel execution on that Stream, individual tasks have to sync their execution to see if they have reached that limit; 因此，当您在该Stream上生成parallel执行时，各个任务必须同步其执行以查看它们是否已达到该限制。 by the time that sync happens there could be multiple elements already processed. 到同步发生时，可能已经处理了多个元素。 For example this: 例如：

 Stream.iterate(0, x -> x + 1)
            .peek(System.out::println)
            .parallel()
            .limit(2)
            .collect(Collectors.toList());

and this : 和这个：

IntStream.of(1, 2, 3, 4)
            .peek(System.out::println)
            .parallel()
            .limit(2)
            .boxed()
            .collect(Collectors.toList());

will always generate two elements in the List ( Collectors.toList ) and will always output two elements also (via peek ). 将始终在List生成两个元素（ Collectors.toList ），并且还将始终输出两个元素（通过peek ）。

On the other hand this: 另一方面，这是：

Stream<Integer> stream = Stream.generate(new Random()::nextInt).parallel();

List<Integer> list = stream
            .peek(x -> {
                System.out.println("Before " + x);
            })
            .map(x -> {
                System.out.println("Mapping x " + x);
                return x;
            })
            .peek(x -> {
                System.out.println("After " + x);
            })
            .limit(2)
            .collect(Collectors.toList());

will generate two elements in the List , but it may process many more that later will be discarded by the limit . 会在List生成两个元素，但是可能处理的元素更多，以后会被limit丢弃。 This is what you are actually seeing in your example. 这就是您在示例中实际看到的内容。

The only sane way of going that (as far as I can tell) would be to create a custom Spliterator. 据我所知，唯一可行的方法是创建一个自定义的Spliterator。 I have not written many of them, but here is my attempt: 我没有写很多，但是这是我的尝试：

 static class LimitingSpliterator<T> implements Spliterator<T> {

    private int limit;

    private final Supplier<T> generator;

    private LimitingSpliterator(Supplier<T> generator, int limit) {
        Preconditions.checkArgument(limit > 0);
        this.limit = limit;
        this.generator = Objects.requireNonNull(generator);
    }

    @Override
    public boolean tryAdvance(Consumer<? super T> consumer) {
        if (limit == 0) {
            return false;
        }
        T nextElement = generator.get();
        --limit;
        consumer.accept(nextElement);
        return true;
    }

    @Override
    public LimitingSpliterator<T> trySplit() {

        if (limit <= 1) {
            return null;
        }

        int half = limit >> 1;
        limit = limit - half;
        return new LimitingSpliterator<>(generator, half);
    }

    @Override
    public long estimateSize() {
        return limit >> 1;
    }

    @Override
    public int characteristics() {
        return SIZED;
    }
}

And the usage would be: 用法是：

 StreamSupport.stream(new LimitingSpliterator<>(new Random()::nextInt, 7), true)
            .peek(System.out::println)
            .collect(Collectors.toList());

生成无限并行流

问题描述

Problem 问题

So, the question is... 所以，问题是...

Code sample 代码样例

2 个解决方案

解决方案1
4 2017-09-04 17:33:48

解决方案2
3 2017-08-23 09:23:37

生成无限并行流

问题描述

Problem 问题

So, the question is... 所以，问题是...

Code sample 代码样例

2 个解决方案

解决方案1 4 2017-09-04 17:33:48

解决方案2 3 2017-08-23 09:23:37

解决方案1
4 2017-09-04 17:33:48

解决方案2
3 2017-08-23 09:23:37