Optimizing Prime Number Python Code

Question

I'm relatively new to the python world, and the coding world in general, so I'm not really sure how to go about optimizing my python script. The script that I have is as follows:

import math
z = 1
x = 0
while z != 0:
    x = x+1
    if x == 500:
        z = 0
    calculated = open('Prime_Numbers.txt', 'r')
    readlines = calculated.readlines()
    calculated.close()
    a = len(readlines)
    b = readlines[(a-1)]

    b = int(b) + 1
    for num in range(b, (b+1000)):
        prime = True
        calculated = open('Prime_Numbers.txt', 'r')
        for i in calculated:
            i = int(i)
            q = math.ceil(num/2)
            if (q%i==0):
                prime = False
        if prime:
            calculated.close()
            writeto = open('Prime_Numbers.txt', 'a')
            num = str(num)
            writeto.write("\n" + num)
            writeto.close()
            print(num)

As some of you can probably guess I'm calculating prime numbers. The external file that it calls on contains all the prime numbers between 2 and 20. The reason that I've got the while loop in there is that I wanted to be able to control how long it ran for.

If you have any suggestions for cutting out any clutter in there could you please respond and let me know, thanks.

Answer 1

Reading and writing to files is very, very slow compared to operations with integers. Your algorithm can be sped up 100-fold by just ripping out all the file I/O:

import itertools

primes = {2}  # A set containing only 2

for n in itertools.count(3):  # Start counting from 3, by 1
    for prime in primes:      # For every prime less than n
        if n % prime == 0:    # If it divides n
            break             # Then n is composite
    else:
        primes.add(n)         # Otherwise, it is prime
        print(n)

A much faster prime-generating algorithm would be a sieve. Here's the Sieve of Eratosthenes, in Python 3:

end = int(input('Generate primes up to: '))
numbers = {n: True for n in range(2, end)}  # Assume every number is prime, and then

for n, is_prime in numbers.items():         # (Python 3 only)
    if not is_prime:
        continue                            # For every prime number

    for i in range(n ** 2, end, n):         # Cross off its multiples
        numbers[i] = False

    print(n)

Answer 2

It is very inefficient to keep storing and loading all primes from a file. In general file access is very slow. Instead save the primes to a list or deque. For this initialize calculated = deque() and then simply add new primes with calculated.append(num) . At the same time output your primes with print(num) and pipe the result to a file.

When you found out that num is not a prime, you do not have to keep checking all the other divisors. So break from the inner loop:

if q%i == 0:
    prime = False
    break

You do not need to go through all previous primes to check for a new prime. Since each non-prime needs to factorize into two integers, at least one of the factors has to be smaller or equal sqrt(num) . So limit your search to these divisors.

Also the first part of your code irritates me.

z = 1
x = 0
while z != 0:
    x = x+1
    if x == 500:
        z = 0

This part seems to do the same as:

for x in range(500):

Also you limit with x to 500 primes, why don't you simply use a counter instead, that you increase if a prime is found and check for at the same time, breaking if the limit is reached? This would be more readable in my opinion.

In general you do not need to introduce a limit. You can simply abort the program at any point in time by hitting Ctrl+C .

However, as others already pointed out, your chosen algorithm will perform very poor for medium or large primes. There are more efficient algorithms to find prime numbers: https://en.wikipedia.org/wiki/Generating_primes , especially https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes .

Answer 3

You're writing a blank line to your file, which is making int() traceback. Also, I'm guessing you need to rstrip() off your newlines.

I'd suggest using two different files - one for initial values, and one for all values - initial and recently computed.

If you can keep your values in memory a while, that'd be a lot faster than going through a file repeatedly. But of course, this will limit the size of the primes you can compute, so for larger values you might return to the iterate-through-the-file method if you want.

For computing primes of modest size, a sieve is actually quite good, and worth a google.

When you get into larger primes, trial division by the first n primes is good, followed by m rounds of Miller-Rabin. If Miller-Rabin probabilistically indicates the number is probably a prime, then you do complete trial division or AKS or similar. Miller Rabin can say "This is probably a prime" or "this is definitely composite". AKS gives a definitive answer, but it's slower.

FWIW, I've got a bunch of prime-related code collected together at http://stromberg.dnsalias.org/~dstromberg/primes/

Optimizing Prime Number Python Code

Question

3 answers

solution1
3 2014-01-02 07:22:53

solution2
1 2014-01-02 07:16:49

solution3
0 2014-01-02 06:45:01

Optimizing Prime Number Python Code

Question

3 answers

solution1 3 2014-01-02 07:22:53

solution2 1 2014-01-02 07:16:49

solution3 0 2014-01-02 06:45:01

solution1
3 2014-01-02 07:22:53

solution2
1 2014-01-02 07:16:49

solution3
0 2014-01-02 06:45:01