简体   繁体   English

我可以使用生成器优化素数求和函数吗?

[英]Can I optimize my prime number summing function with generators?

Problem: While the following code works, it takes far too long to be of any use in finding the sum of all primes below the number 2,000,000. 问题:虽然以下代码有效,但要找到小于2,000,000的所有素数之和,将花费很长时间。

Past attempts: I've attempted implementing while loops, counters, and a number of other tools to modify the code, but they end up modifying my results as well. 过去的尝试:我曾尝试实现while循环,计数器和许多其他工具来修改代码,但最终它们也修改了我的结果。 Previously, I'd simply added the numbers to an existing variable instead of appending them to a list, but the result was the same. 以前,我只是将数字添加到现有变量中,而不是将它们附加到列表中,但是结果是相同的。

I believe that a generator function/expression will solve the problem, but I've had trouble implementing either the function, expression, or both. 我相信生成器函数/表达式可以解决问题,但是我在实现函数,表达式或两者时遇到了麻烦。

# Prime number determiner 
def is_prime(x):
    for i in range(2, x-1):
        if x % i == 0:
            return False
    else:
        return True

# Function summing all prime numbers between 2 and 2,000,000
for i in range(2, 2000000):
    if is_prime(i) is True:
        primes.append(i)
results = sum(primes)
print(primes)

Previous attempt at generator expressions/functions : 以前尝试生成器表达式/函数

#Generator version of above
def is_prime_gen(x):
     yield (i for i in range(2, x-1) if x % i == 0)
sum_prime += (j for j in range(2, 2000000) if is_prime_gen(j))

Expected results: I don't need the result to process super-fast, but I would like it to process within a minute or two. 预期结果:我不需要超快处理结果,但我希望在一两分钟内处理完。

Bonus: For anyone responding, it would be helpful to me if you could also explain how you came to your conclusions (and while "experience" is a valid explanation, it isn't a helpful). 奖励:对于任何回答的人来说,如果您还可以解释您如何得出结论,这对我会有所帮助(尽管“经验”是一个有效的解释,但没有帮助)。

Your focus on making a generator function is an example of the XY problem . 您专注于生成器函数是XY问题的一个示例。 You've decided that the solution to your code's performance problems is to use a generator, but that's not actually correct. 您已经确定解决代码性能问题的方法是使用生成器,但这实际上是不正确的。 When you get non-generator-related answers, you think they're not helpful, and the rest of us are just a bit confused about why generators are relevant in the first place. 当您获得与发电机无关的答案时,您会认为它们无济于事,而我们其余的人对于发电机为何首先具有相关性感到有些困惑。

Lets examine why you're having performance issues. 让我们检查一下为什么出现性能问题。 The main problem is that your code takes O(n) time to determine if each number n is prime. 主要问题是您的代码需要O(n)时间来确定每个数字n是否为质数。 You have to do this for each numbers from two up to whatever your limit is. 您必须对每个数字执行此操作,从两个到最大限制。 This means the whole algorithm takes O(N**2) time where N is the largest number to check (eg two million). 这意味着整个算法需要O(N**2)时间,其中N是要检查的最大数字(例如200万)。 For a large N your code will take a very long time. 对于较大的N您的代码将花费很长时间。

Using a generator for your primes won't, by itself, improve that. 使用发电机生成素数本身并不能改善这一点。 It will still take just as long to figure out if each candidate value is prime, and you still need to check all the same numbers if you stick with your current algorithm. 找出每个候选值是否为质数仍然需要花费很长的时间,如果您坚持使用当前算法,则仍然需要检查所有相同的数字。 At best it would be as good as adding the prime numbers immediately to a running sum, rather than putting them into a list and summing at the end. 充其量最好是将质数立即加到一个连续的总和上,而不是将它们放在列表中并在末尾求和。 That is, it could save you memory, but not time. 也就是说,它可以节省您的内存,但不会节省时间。

The real way to greatly improve your performance is to use a smarter algorithm that does less work. 真正提高性能的真正方法是使用工作量更少的更智能算法。 There are a bunch of good ways to find primes in less time. 有很多很好的方法可以在更短的时间内找到素数。 Some can be implemented as generators, but in this situation, computing all the primes at once and using extra memory in exchange for better performance is probably a reasonable trade-off. 有些可以实现为生成器,但是在这种情况下,一次计算所有素数并使用额外的内存以换取更好的性能可能是一个合理的权衡。

That's because your computer can hold billions of integers in memory at one time. 那是因为您的计算机可以一次在内存中保存数十亿个整数。 Numbers less than a few billion use about 28 byes in Python, so two million of them takes around 56 MB, plus about 18 MB more for the list data structure. 不到几十亿的人在Python中使用了大约28个字节,因此其中200万个字节占用了约56 MB的内存,此外列表数据结构还占用了约18 MB的内存。 So you can do a memory intensive algorithm without needing to worry about your memory usage. 因此,您可以执行内存密集型算法,而不必担心您的内存使用情况。

Here's a very fast implementation of the Sieve of Eratosthenes algorithm for computing all of the primes less than N in pure Python. 是Eratosthenes算法筛网的非常快速的实现,用于计算纯Python中少于N的所有素数。 The implementation was originally by Robert Williams Hanks in this answer , but this version was tweaked a bit by Bruno Astrolino to work a little more efficiently in Python 3.6+ in this answer . 实现最初由罗伯特·威廉姆斯汉克斯这个答案 ,但这个版本调整了有点布鲁诺Astrolino Python中更有效地工作有点3.6+在这个答案

from itertools import compress

def rwh_primes1v1(n):
    """ Returns  a list of primes < n for n > 2 """
    sieve = bytearray([True]) * (n//2)
    for i in range(3,int(n**0.5)+1,2):
        if sieve[i//2]:
            sieve[i*i//2::i] = bytearray((n-i*i-1)//(2*i)+1)
    return [2,*compress(range(3,n,2), sieve[1:])]

You would want to run sum(rwh_primes1v1(2_000_000)) . 您可能想运行sum(rwh_primes1v1(2_000_000)) On my computer that takes about 30 milliseconds, compared to your code which takes 30 seconds (1000 times longer) for N=100_000 (a bound twenty times less). 在我的计算机上,该代码大约需要30毫秒,而您的代码在N = 100_000(少20倍)的情况下需要30秒(长1000倍),而您的代码需要30秒。 I wasn't willing to wait for the three hours or so the inefficient algorithm would need for N=2_000_000. 我不愿意等待三个小时左右,所以效率低下的算法将需要N = 2_000_000。

Note that if you really do want a generator that yields the primes for some other reason, there are some good implementations of infinite prime generators in the answers to this question . 请注意,如果您确实希望生成因其他原因产生质数的生成器, 则在此问题的答案中可以使用无限质数生成器的一些良好实现。 It's unlikely that using any of them for your summing problem is going to result in faster code than what I provided above until you get to such a large N that you can't fit the whole sieve in memory at once (and only some of the generators will help with that, some have significant memory overheads themselves). 在遇到如此大的N以至于您无法一次将整个筛子放入内存时,使用它们中的任何一个来解决求和问题的代码都不可能比我上面提供的代码更快。生成器将对此有所帮助,其中一些生成器本身会占用大量内存。

I think the key problem is how to find all the prime number fast and correct. 我认为关键问题是如何快速正确地找到所有素数。 And there are many answers about it. 并且有很多答案 I find one as belows: 我发现如下:

def isprime(n):
    """Returns True if n is prime."""
    if n == 2:
        return True
    if n == 3:
        return True
    if n % 2 == 0:
        return False
    if n % 3 == 0:
        return False

    i = 5
    w = 2

    while i * i <= n:
        if n % i == 0:
            return False

        i += w
        w = 6 - w

    return True

sum = 0
for n in range(2, 20000):
    if isprime(n):
        sum += n

When in range(2, 10000), the time cost is: 在范围(2,10000)中时,时间成本为:

0.0043639220002660295  # this answer
0.25401434600007633  # your answer

When comes to (2, 100000), the time cost is : 谈到(2,100000),时间成本是:

0.1730230279999887  # this answer
19.639503588000025  # your answer
import time
prime = (i for i in range(2, 2000000) if is_prime(i))

def is_prime(num):
    if num == 2:
        return True
    if num == 3:
        return True
    if num % 2 == 0:
        return False
    if num % 3 == 0:
        return False

    i = 5
    w = 2

    while i * i <= num:
        if num % i == 0:
            return False

        i += w
        w = 6 - w

    return True

print(sum(prime))
print(time.perf_counter())

i'm no expert, but i think this should work and be quite simple to understand. 我不是专家,但是我认为这应该有效并且很容易理解。 i used the improved function that ToughMind shared. 我使用了ToughMind共享的改进功能。 takes my system 15.5 secs to calculate the sum 我的系统花了15.5秒来计算总和

The answer to this query comes down to what you mean by optimize . 该查询的答案归结为您所说的optimize A generator can be used to optimize space usage. 发生器可用于优化空间使用。 Where you waste space is in this logic in your main code: 浪费空间的地方是您的主代码中的以下逻辑:

primes.append(i)

Your is_prime() function doesn't waste space. 您的is_prime()函数不会浪费空间。 Generators only save time when the sequence computation can be aborted early, rather than completely created and then partially used. 仅当序列计算可以提前中止而不是完全创建然后部分使用时,生成器才节省时间 That isn't the case here. 这里不是这种情况。

Here's a simple rework that cleans up your is_prime() implementation time-wise and uses a generator expression to avoid creating a list of primes: 这是一个简单的返工,可以is_prime()清理is_prime()实现,并使用生成器表达式来避免创建素数列表:

def is_prime(number):
    if number <= 2 or number % 2 == 0:
        return number == 2

    for divisor in range(3, int(number ** 0.5) + 1, 2):
        if number % divisor == 0:
            return False

    return True

result = sum(number for number in range(2, 2_000_000) if is_prime(number))

print(result)

This completes the task in about 10 seconds, well within your minute or two limit, and doesn't take much code. 这样就可以在大约10秒钟内完成任务,而且只需花一两分钟即可完成,而且不需要太多代码。 It isn't optimal time-wise , just better time-wise, and reasonably optimal space-wise . 它不是最佳的时间范围 ,而是更好的时间范围,也不是合理的最佳空间范围

REVISIT 重新访问

There is another way a generator can provide time improvement beyond what I describe above. 生成器可以提供另一种方式来改善我上面描述的时间 Unlike is_prime() , which can be passed any number at any time, a generator can guarantee that it will be working with ascending numbers so it could make simplifying assumptions. is_prime()可以随时传递任何数字不同,生成器可以保证它将使用递增数字,因此可以简化假设。 Similarly, it can maintain state between calls, unlike is_prime() as implemented. 同样,它可以保持调用之间的状态,这与实现的is_prime()不同。 Let's rework this problem by generating the primes: 让我们通过生成素数来解决这个问题:

def prime_generator(limit):
    yield 2

    number = 3

    while number <= limit:
        for divisor in range(3, int(number ** 0.5) + 1, 2):
            if number % divisor == 0:
                break
        else:  # no break
            yield number

        number += 2

print(sum(prime_generator(2_000_000)))

Playing with various arrangements of this approach, it provides a 5% speed up at best over my original solution. 使用这种方法的各种安排,与我的原始解决方案相比,它最多可提高5%。

THE SIEVE 筛子

Finally, let's solve this problem using a sieve. 最后,让我们使用筛子解决此问题。 This approach uses more space than the solutions above to gain performance time-wise: 这种方法比上面的解决方案使用更多的空间来按时获得性能:

def sum_primes(limit):  # assumes limit > 1
    sieve = [False, False, True] + [True, False] * ((limit - 1) // 2)
    number = 3
    result = 2

    while number <= limit:
        if sieve[number]:
            result += number

            for i in range(number * number, limit + 1, number):
                sieve[i] = False
        number += 2

    return result

print(sum_primes(2_000_000))

This sums the primes in less than 1 second on my system. 这将在不到1秒的时间内将素数加到我的系统上。 It's 15x faster than the previous generator-based solution. 它比以前的基于生成器的解决方案快15倍。

Here is a generator which uses a hybrid boot-strap approach. 这是使用混合引导方法的生成器。 It uses a (not particularly efficient) sieve to identify the primes below the square root, storing them as it yields them, and then use these for trial division of the remaining odd numbers below n . 它使用一个(不是特别有效的)筛子来识别平方根以下的素数,将其存储起来以产生它们,然后将其用于对n以下的剩余奇数进行尝试除法。 For n = 2_000_000 , it never stores more than around 700 numbers, so it has a smallish memory footprint (at the cost of more processing time): 对于n = 2_000_000 ,它永远不会存储超过700个数字,因此它的内存占用量很小(以更多的处理时间为代价):

import math

def primes(n):
    k = 1 + int(math.sqrt(n))
    #phase 1: sieve to k
    if n >= 2:
        yield 2
        small_primes = [2]
        candidates = [2*i + 1 for i in range(1,(k+1)//2)]
        while len(candidates) > 0:
            p = candidates[0]
            small_primes.append(p)
            candidates = [x for x in candidates if x % p != 0]
            yield p
    #at this stage we have all primes below k
    #loop through remaining odd numbers
    #dividing by these primes
    if k%2 == 0: k +=1
    while k <= n:
        if all(k%p != 0 for p in small_primes): yield k
        k += 2

I didn't bother to time it, but sum(primes(2_000_000)) takes around 3 seconds. 我没有时间计时,但是sum(primes(2_000_000))大约需要3秒钟。 The reason I didn't bother to time it is because I didn't want to embarrass it when compared to the code of Blckkght -- which shows just how fast a non-generator optimized sieve approach can be. 我不费时间的原因是,与Blckkght的代码相比,我不想让它感到尴尬-这表明非发电机优化的筛分方法可以达到多快。

Here is a very fast pure Python prime generator created by Willy Good, found in a comment here . 这是Willy Good创建的非常快速的纯Python主生成器,可在此处的评论中找到 This may be overkill as far as performance and complexity for your particular use case, but I don't think many of the stackoverflow primes in Python people are aware of it. 就您的特定用例而言,就性能和复杂性而言,这可能是过高的,但是我认为Python人们中没有很多stackoverflow素数意识到这一点。

def primes235(limit):
    yield 2; yield 3; yield 5
    if limit < 7: return
    modPrms = [7,11,13,17,19,23,29,31]
    gaps = [4,2,4,2,4,6,2,6,4,2,4,2,4,6,2,6] # 2 loops for overflow
    ndxs = [0,0,0,0,1,1,2,2,2,2,3,3,4,4,4,4,5,5,5,5,5,5,6,6,7,7,7,7,7,7]
    lmtbf = (limit + 23) // 30 * 8 - 1 # integral number of wheels rounded up
    lmtsqrt = (int(limit ** 0.5) - 7)
    lmtsqrt = lmtsqrt // 30 * 8 + ndxs[lmtsqrt % 30] # round down on the wheel
    buf = [True] * (lmtbf + 1)
    for i in xrange(lmtsqrt + 1):
        if buf[i]:
            ci = i & 7; p = 30 * (i >> 3) + modPrms[ci]
            s = p * p - 7; p8 = p << 3
            for j in range(8):
                c = s // 30 * 8 + ndxs[s % 30]
                buf[c::p8] = [False] * ((lmtbf - c) // p8 + 1)
                s += p * gaps[ci]; ci += 1
    for i in xrange(lmtbf - 6 + (ndxs[(limit - 7) % 30])): # adjust for extras
        if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])

A speed comparison to Robert William Hanks' best pure Python solution, which is more compact and easier to understand: Robert William Hanks最好的纯Python解决方案的速度比较,该解决方案更紧凑,更易于理解:

$ time ./prime_rwh2.py 1e7
664579 primes found < 1e7

real    0m0.883s
user    0m0.266s
sys     0m0.047s
$ time ./prime_wheel.py 1e7
664579 primes found < 1e7

real    0m0.285s
user    0m0.234s
sys     0m0.063s

Willy Good's solution is a mod 30 wheel sieve which avoids using/storing multiples of 2, 3, and 5 except to manually yield them to make it complete. 威利·古德(Willy Good)的解决方案是使用Mod 30轮式筛子,该筛子避免使用/存储2、3和5的倍数,除非手动将其完成即可。 It works great for me up to about 2.5e9 where the 8G of RAM in my laptop is totally used up and the system thrashes. 它在大约2.5e9的频率下对我来说非常有效,笔记本电脑上的8G RAM已全部用完,系统崩溃了。

The result of get_sum_of_primes_in_range(0, constant_max_value) will be a constant, which can be pre-computed. get_sum_of_primes_in_range(0, constant_max_value)将是一个常量,可以预先计算。

The result of get_sum_of_primes_in_range(0, n+x) can be done as get_sum_of_primes_in_range(0, n) + get_sum_of_primes_in_range(n, x) . get_sum_of_primes_in_range(0, n+x)可以作为get_sum_of_primes_in_range(0, n) + get_sum_of_primes_in_range(n, x)

By combining these things; 通过结合这些东西; you can have a table of pre-computed results for selected values of n and only use processing time to find the get_sum_of_primes_in_range(n, x) part. 您可以为n选定值提供一个预先计算的结果表,并且仅使用处理时间来找到get_sum_of_primes_in_range(n, x)部分。

Basically; 基本上; instead of doing get_sum_of_primes_in_range(0, x) ; 而不是执行get_sum_of_primes_in_range(0, x) ; you can do k = x / 100 and n = k * 100 , result = table[k] + get_sum_of_primes_in_range(n, x) and skip a massive amount of work; 您可以执行k = x / 100n = k * 100result = table[k] + get_sum_of_primes_in_range(n, x)并跳过大量工作; where the amount of work you expect to be able to skip (on average) depends on how large you want to make that table of pre-computed results. 您期望能够跳过的工作量(平均)取决于要制作该表预先计算的结果的大小。

For get_sum_of_primes_in_range(n, x) you want something based on "Sieve of Eratosthenes" (see https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes ). 对于get_sum_of_primes_in_range(n, x)您需要基于“ get_sum_of_primes_in_range(n, x)筛子”的东西(请参阅https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes )。 Note that Sieve of Eratosthenes can be started from an arbitrary value using modulo and needn't start from 0. 请注意,Eratosthenes的Sieve可以使用模数从任意值开始,而不必从0开始。

First, I have a math background. 首先,我有数学背景。 Second, this uses Fermant's Little Theroem (though I am unsure of the name, I ahd forgotten). 其次,这使用了Fermant的Little Theroem(尽管我不确定这个名字,但我还是忘记了)。 I just Googlged and spent a lot of time coding and debugging. 我只是Googlged,花了很多时间进行编码和调试。 Here it is! 这里是!

''' '''

def is_prime(): def is_prime():

a = int(input("Eneter any number"))

for p in range(2, a + 1):

    if (a % p == 0):
        isprime = 1
        for j in range(2, (p // 2 + 1)):

            if(p % j == 0):
                isprime = 0
                break

        if (isprime == 1):
            print(" %d is a Prime Factor of  %d" %(p, a))

is_prime() is_prime()

''' '''

Have a great day! 祝你有美好的一天!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM