简体   繁体   English

如何使用并行流在Java中查找第n个质数

[英]How to find nth prime number in Java using parallel Streams

It seems that when using ordered Streams to process a short-circuiting operation on a difficult to bound numeric range, parallel() cannot be used. 看来,当使用有序Streams在难以限制的数字范围上处理短路操作时,不能使用parallel() Eg: 例如:

public class InfiniteTest {

    private static boolean isPrime(int x) {
        if (x < 2) {
            return false;
        }
        if (x % 2 == 0 && x > 2) {
            return false;
        }
        // loop while i <= sqrt(x), using multiply for speedup
        for (int i = 3; i * i <= x; i += 2) {
            if (x % i == 0) {
                return false;
            }
        }
        return true;
    }

    private static int findNthPrime(final int n) {
        // must not use infinite stream, causes OOME
        // but even big size causes huge slowdown
        IntStream.range(1, 1000_000_000)            
                // .parallel()
                .filter(InfiniteTest::isPrime)
                .skip(n - 1)
                .findFirst()
                .getAsInt();
    }

    public static void main(String[] args) {
        int n = 1000; // find the nth prime number
        System.out.println(findNthPrime(n));
    }
}

This sequential stream works fine. 此顺序流工作正常。 But when I add parallel() , it seems to run forever (or very long at last). 但是,当我添加parallel() ,它似乎可以永远运行(或最终运行很长时间)。 I assume it's because the stream threads work on arbitrary numbers instead of starting with the first numbers in the stream. 我认为这是因为流线程处理任意数字,而不是从流中的第一个数字开始。 I cannot usefully bound the range of integers to scan for prime numbers . 我无法有效地限制整数范围以扫描素数

So is there any simple trick to run this problem in parallel with streams without that trap, such as forcing the splititerator to serve chunks of work from the beginning of the stream? 那么,是否有任何简单的技巧可以在没有该陷阱的情况下与流并行运行此问题,例如强制splititerator从流的开头开始处理大量工作? Or building the stream from substreams that cover increasing number ranges? 还是从覆盖越来越多的范围的子流中构建流? Or somehow setting up the multithreading as producer/consumer pattern but with streams? 还是以某种方式将多线程设置为生产者/消费者模式但使用流?

Similar questions all just seem to try to discourage use of parallel: 所有类似的问题似乎都试图阻止并行的使用:

Apart from 2 and 3, all prime numbers are of the form 6n-1 or 6n+1. 除2和3外,所有质数均采用6n-1或6n + 1的形式。 You already treat 2 as a special case in your code. 您已经在代码中将2视为特例。 You might want to try also treating 3 as special: 您可能还想尝试将3视为特殊:

if (x % 3 == 0) {
    return x == 3;
}

And then run two parallel streams, one testing numbers of the form 6n-1, starting at 5, and the other testing numbers of the form 6n+1, starting at 7. Each stream can skip six numbers at a time. 然后运行两个并行流,一个测试编号以6n-1形式从5开始,另一个测试编号以6n + 1形式从7开始。每个流一次可以跳过六个数字。

You can use the Prime Number theorem to estimate the value of the nth prime and set the limit of your search slightly above that estimate for safety. 您可以使用素数定理来估计第n个素数的值,并将搜索范围设置为稍高于该估计值的安全性。

TL/DR : It is not possible. TL / DR :不可能。

It seems processing unbounded streams in parallel with a short-circuit method to find the earliest occurrences(in stream order) of anything is not possible in a useful way ("useful" meaning better than sequential in terms of time to find the result). 似乎用一种短路方法并行处理无边界流以找到任何事物的最早出现(按流的顺序)是不可能的(一种有用的方式(“有用”的含义比按时间顺序查找结果要好))。

Explanation I tried a custom implementation of AbstractIntSpliterator that splits the stream not in partitions (1-100, 101-200, ...) but instead splits them interleavingly ([0, 2, 4, 6, 8, ...], [1, 3, 5, 6 ...]). 解释我尝试了AbstractIntSpliterator的自定义实现,该实现不将流拆分为分区(1-100、101-200,...),而是交错地拆分它们([0、2、4、6、8,...], [1、3、5、6 ...])。 This works correctly in the sequential case: 这在顺序情况下可以正常工作:

/**
 * Provides numbers starting at n, on split splits such that child iterator and
 * this take provide interleaving numbers
 */
public class InterleaveSplitIntSplitIterator extends Spliterators.AbstractIntSpliterator {

    private int current;
    private int increment;

    protected InterleaveSplitIntSplitIterator(int start, int increment) {
        super(Integer.MAX_VALUE,
                        Spliterator.DISTINCT
                        // splitting is interleaved, not prefixing
                        // | Spliterator.ORDERED
                        | Spliterator.NONNULL
                        | Spliterator.IMMUTABLE
                        // SORTED must imply ORDERED
                        // | Spliterator.SORTED
        );
        if (increment == 0) {
            throw new IllegalArgumentException("Increment must be non-zero");
        }
        this.current = start;
        this.increment = increment;
    }

    @Override
    public boolean tryAdvance(IntConsumer action) {
        // Don't benchmark with this on
        // System.out.println(Thread.currentThread() + " " + current);
        action.accept(current);
        current += increment;
        return true;
    }

    // this is required for ORDERED even if sorted() is never called
    @Override
    public Comparator<? super Integer> getComparator() {
        if (increment > 0) {
            return null;
        }
        return Comparator.<Integer>naturalOrder().reversed();
    }

    @Override
    public OfInt trySplit() {
        if (increment >= 2) {
            return null;
        }
        int newIncrement = this.increment * 2;
        int oldIncrement = this.increment;

        this.increment = newIncrement;
        return new InterleaveSplitIntSplitIterator(current + oldIncrement, newIncrement);
    }

    // for convenience
    public static IntStream asIntStream(int start, int increment) {
        return StreamSupport.intStream(
                new InterleaveSplitIntSplitIterator(start, increment),
                /* no, never set parallel here */ false);
    }
}

However, such streams cannot have the Spliterator.ORDERED characteristics, because 但是,此类流不能具有Spliterator.ORDERED特性,因为

If so, this Spliterator guarantees that method {@link #trySplit} splits a strict prefix of elements 如果是这样,此Spliterator保证方法{@link #trySplit}拆分元素的严格前缀

and this also means such a stream cannot keep it's SORTED characteristics, because 这也意味着此类流无法保持其已排序的特征,因为

A Spliterator that reports {@code SORTED} must also report {@code ORDERED} 报告{@code SORTED} SORTED {@code SORTED}拆分器还必须报告{@code ORDERED}

So my splititerator in parallel ends up having (somewhat) jumbled numbers, which would have to be fixed by sorting before applying a limit, which does not work well with infinite streams (in the general case). 因此,我的splititerator并行出现了(有些)混乱的数字,必须在应用限制之前通过排序将其固定,这种限制不适用于无限流(通常情况下)。

So all solutions to this must use a splititerator that splits in chunks or prefix data, which then are consumed in ~arbitrary order, which causes many number ranges beyond the actual result to be processed, becoming (much) slower in general than a sequential solution. 因此,所有解决方案都必须使用splititerator,该拆分器将块或前缀数据拆分为任意数据,然后以〜任意顺序使用它,这将导致超出实际结果处理范围的许多数值范围,通常比顺序解决方案要慢(很多) 。

So other than bounding the number range to test, it seems there cannot be a solution using a parallel stream. 因此,除了限制要测试的数字范围外,似乎没有使用并行流的解决方案。 The problem is in the specification requiring ORDERED characteristics to split a Stream by prefixing, instead of providing a different means of reassembling ordered stream results from multiple splititerators. 问题在于规范中要求ORDERED特性通过前缀来拆分流,而不是提供不同的方法来重组来自多个splititerator的有序流结果。

However a solution using a sequential stream with parallelly processed (buffered) inputs may still be possible (but not as simple as calling parallel() ). 但是,使用顺序流和并行处理(缓冲)输入的解决方案仍然可能(但不像调用parallel()那样简单)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM