Efficiently generate all composite numbers less than N (with their factorizations)

Question

I'd like to build an efficient Python iterator/generator that yields:

All composite numbers less than N
Along with their prime factorization

I'll call it "composites_with_factors()"

Assume we already have a list of primes less than N, or a primes generator that can do the same.

Note that I:

DO NOT need the numbers to be yielded in numerical order
DO NOT care if 1 is yielded at the beginning or not
DO NOT care if primes are yielded, too

I figure this can be done with a clever recursive generator...

So, for example, a call to composites_with_factors(16) may yield:

# yields values in form of "composite_value, (factor_tuple)"
2, (2)
4, (2, 2)
8, (2, 2, 2)
6, (2, 3)
12, (2, 2, 3)
10, (2, 5)
14, (2, 7)
3, (3)
9, (3, 3)
15, (3, 5)
5, (5)
7, (7)
11, (11)
13, (13)

As you can see from the order of my output, I conceive of this working by starting with the smallest prime on the available primes generator, and outputting all powers of that prime less than N, then try again through the powers of that prime but at each stage seeing if I can apply powers of additional primes (and still be less than N). When all combinations with THAT prime are done, drop it, and repeat with the next lowest prime number available on the primes generator.

My attempts to do this with "recursive generators" have gotten me very confused on when to pop out of the recursion with "yield ", or "raise StopIteration", or "return", or simply fall out of the recursed function.

Thanks for your wisdom!

ADDITIONAL NOTE:

I do have one way to do this now: I have written a function to factor numbers, so I can factor them down to primes, and yield the results. No problem. I keep this blazingly fast by relying on a cache of "what is the lowest prime factor of number N"... for N up to 10 million.

However, once I'm out of the cache, we'll, it devolves to "naive" factoring. (Yuck.)

The point of this post is:

I'm assuming that "generating large composites from their factors" will be faster than "factoring large composites"... especially since I DON'T care about order, and
How can you have a Python generator "recursively" call itself, and yield a single stream of generated things?

Answer 1

Assuming primesiter(n) creates an iterator over all primes up to n (1 should NOT be included in primesiter , or following code well enter inf. loop)

def composite_value(n, min_p = 0):
    for p in primesiter(n):
        # avoid double solutions such as (6, [2,3]), and (6, [3,2])
        if p < min_p: continue
        yield (p, [p])
        for t, r in composite_value(n//p, min_p = p): # uses integer division
            yield (t*p, [p] + r)

Output

>> list(composite_value(16))
[(2, [2]),
 (4, [2, 2]),
 (8, [2, 2, 2]),
 (16, [2, 2, 2, 2]),
 (12, [2, 2, 3]),
 (6, [2, 3]),
 (10, [2, 5]),
 (14, [2, 7]),
 (3, [3]),
 (9, [3, 3]),
 (15, [3, 5]),
 (5, [5]),
 (7, [7]),
 (11, [11]),
 (13, [13])]

NOTE: it includes n (= 16) as well, and I used list instead of tuples. Both can easily be resolved if needed, but I will leave that as an exercise.

Answer 2

Here is a sieve-based implementation ( please excuse the un-pythonic code :) ):

def sieve(n):
    # start each number off with an empty list of factors
    #   note that nums[n] will give the factors of n
    nums = [[] for x in range(n)]
    # start the counter at the first prime
    prime = 2
    while prime < n:
        power = prime
        while power < n:
            multiple = power
            while multiple < n:
                nums[multiple].append(prime)
                multiple += power
            power *= prime
        # find the next prime
        #   the next number with no factors
        k = prime + 1
        if k >= n:    # no primes left!!!
            return nums
        # the prime will have an empty list of factors
        while len(nums[k]) > 0:
            k += 1
            if k >= n:    # no primes left!!!
                return nums
        prime = k
    return nums


def runTests():
    primes = sieve(100)
    if primes[3] == [3]:
        print "passed"
    else:
        print "failed"
    if primes[10] == [2,5]:
        print "passed"
    else:
        print "failed"
    if primes[32] == [2,2,2,2,2]:
        print "passed"
    else:
        print "failed"

Tests:

>>> runTests()
passed
passed
passed

On my machine, this took 56 seconds to run:

primes = sieve(14000000) # 14 million!

Examples:

>>> primes[:10]
[[], [], [2], [3], [2, 2], [5], [2, 3], [7], [2, 2, 2], [3, 3]]

>>> primes[10000]
[2, 2, 2, 2, 5, 5, 5, 5]

>>> primes[65536]
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

>>> primes[6561]
[3, 3, 3, 3, 3, 3, 3, 3]

>>> primes[233223]
[3, 17, 17, 269]

Memory consumption: about 50 million integers, in 14 million lists:

>>> sum(map(len, primes))
53303934

Answer 3

Recursively (pseudo-code):

def get_factorizations_of_all_numbers( start = starting_point
                                     , end = end_point
                                     , minp = mimimum_prime
                                     ):
    if start > end:
        return Empty_List
    if minp ^ 2 > end:
        return list_of_all_primes( start, end )
    else
        a = minp * get_factorizations_of_all_numbers( rounddown(start/minp)
                                                    , roundup(end/minp)
                                                    )
        b = get_factorizations_of_all_numbers( start
                                             , end
                                             , next_prime( minp )
                                             )
        return append( a , b )

get_factorizations_of_all_numbers( 1, n, 2 )

Efficiently generate all composite numbers less than N (with their factorizations)

Question

3 answers

solution1
10 ACCPTED 2012-04-11 16:26:17

solution2
4 2012-04-11 16:45:04

solution3
0 2012-04-11 16:42:00

Efficiently generate all composite numbers less than N (with their factorizations)

Question

3 answers

solution1 10 ACCPTED 2012-04-11 16:26:17

solution2 4 2012-04-11 16:45:04

solution3 0 2012-04-11 16:42:00

solution1
10 ACCPTED 2012-04-11 16:26:17

solution2
4 2012-04-11 16:45:04

solution3
0 2012-04-11 16:42:00