简体   繁体   English

我的Java Sieve代码很慢,无法按预期的时间复杂度进行扩展

[英]My Java Sieve code is slow and not scaling at the expected time complexity

I have wrote the following 'segmented sieve' program in Java. 我用Java编写了以下“分段筛”程序。 It take a range of numbers to sieve, crosses out composite numbers using the 'sieving primes' (primes arraylist variable) then returns the prime numbers that have not been crossed out. 筛选需要一定范围的数字,使用“筛除质数”(质数arraylist变量)将合成数相除,然后返回尚未被相除的质数。 Here is the code: 这是代码:

public ArrayList<Integer> sieveWorker(int start, int last, ArrayList<Integer> primes) {

    System.out.println("Thread started for range: " + start + "-" + last);
    ArrayList<Integer> nonPrimes = new ArrayList<Integer>();
    ArrayList<Integer> primeNumbers = new ArrayList<Integer>();
    ArrayList<Integer> numbers = new ArrayList<Integer>();

    //numbers to be sieved
    for (int i = start; i <= last; i += 2) {
        numbers.add(i);
    }

    //identifies composites of the sieving primes, then stores them in an arraylist
    for (int i = 0; i < primes.size(); i++) {

        int head = primes.get(i);

        if ((head * head) <= last) {
            if ((head * head) >= start) {
                for (int j = head * head; j <= last; j += head * 2) {
                    nonPrimes.add(j);
                }
            } else {
                int k = Math.round((start - head * head) / (2 * head));
                for (int j = (head * head) + (2 * k * head); j <= last; j += head * 2) {
                    nonPrimes.add(j);
                }
            }
        }

    }

    numbers.removeAll(nonPrimes);
    System.out.println("Primes: " + numbers);
    return numbers;
}

My problem is that it's very slow and performing at a time complexity of o(n^3) instead of the expected time of complexity of o(n log log n). 我的问题是它非常慢,并且执行时的复杂度为o(n ^ 3),而不是预期的复杂度为o(n log log n)。 I need suggestions on optimisation and correcting its time complexity. 我需要有关优化和纠正其时间复杂度的建议。

The culprit is the numbers.removeAll(nonPrimes) call which for each number in numbers (and there are O(n) of them) searches through all of nonPrimes potentially (and there are O(n log log last) of them) to check the membership (and nonPrimes is non-sorted, too). 罪魁祸首是numbers.removeAll(nonPrimes)调用,它针对数字中的每个numbers (并且有O(n)个)潜在地搜索所有非nonPrimes (并且它们中有O(n个log log个) )以进行检查成员资格( nonPrimes也未排序)。 n is the length of numbers , n = last - start . nnumbers的长度, n = last - start

So instead of O(1) marking of each non-prime you have an O(n log log last) actual removal of it, for each of the O(n) of them. 因此,代替每个非质数的O(1) 标记 ,您可以对它们中的每个O(n)进行O(n log log last)实际删除。 Hence the above O(n^2) operations overall. 因此,上述O(n ^ 2)操作总体而言。

One way to overcome this is to use simple arrays, and mark the non-primes. 解决此问题的一种方法是使用简单的数组,并标记非素数。 Removal destroys the direct address capability. 删除会破坏直接地址功能。 If use it at all, the operations must be on-line , with close to O(1) operations per number. 如果完全使用它,则操作必须在线 ,每个数字接近O(1)个操作。 This can be achieved by making the non-primes be a sorted list, then to remove them from numbers iterate along both in linear fashion. 这可以通过使非素数成为有序列表,然后将其从线性迭代的数字中删除来实现。 Both tasks easiest done with arrays, again. 同样,这两个任务最容易用数组完成。

Explanation 说明

numbers.removeAll(nonPrimes);

must find elements. 必须找到元素。 That's basically contains and contains on ArrayList is slow, O(n) . 基本上, 包含包含ArrayList上的速度很慢, O(n)

It iterates the whole list from left to right and removes the matching elements. 它从左到右迭代整个列表,并删除匹配的元素。 And it does this for every element in your nonPrimes collection. 它会对nonPrimes集合中的每个元素执行此操作。 So you will get a complexity of O(n * |nonPrimes|) just for the removeAll part. 因此,仅对于removeAll部分,您将获得O(n * |nonPrimes|)的复杂度。


Solution

There is an easy fix, exchange your data-structure. 有一个简单的解决方法,可以交换您的数据结构。 Structures like HashSet where made for O(1) contains queries. O(1)构造的HashSet结构包含查询。 Since you only need to add and removeAll on numbers , consider using a HashSet instead, which runs both in O(1) (ammortized). 因为你只需要addremoveAllnumbers ,可以考虑使用HashSet相反,它运行在两个O(1) (ammortized)。

Only change in code: 仅更改代码:

Set<Integer> numbers = new HashSet<>();

Another possibility is to do some algorithmic changes. 另一种可能性是进行一些算法更改。 You can avoid the removeAll in the end by marking the elements while you collect them. 您可以通过在收集元素时标记元素来最终避免removeAll The advantage is that you could use arrays then. 好处是您可以使用数组。 The big advantage then is that you avoid the boxed Integer class and directly run on the primitives int which are faster and don't consume as much space. 这样做的最大好处是,您可以避免使用盒装的Integer类,而直接在原语int上运行,而原语int更快,并且不会占用太多空间。 Check the answer of @Will_Ness for details on this approach. 有关此方法的详细信息,请检查@Will_Ness的答案。


Note 注意

Your primeNumbers variable is never used in your method. 您的primeNumbers变量从未在您的方法中使用。 Consider removing it. 考虑删除它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM