简体   繁体   中英

python recursion out of memory

When this code runs, OSX notifies me that I'm out of application memory and pauses the application. The amount of space that is used by Python breaks 10 gigs very quickly. This code never hits Python's max levels of recursion, it should only hit 525 worst case, but due to caching it should be much smaller. I have a feeling that the list chain is getting copied with every level of recursion, but it seems like it's a global variable that should be shared with every call of collatz(). I've looked for similar questions on stackoverflow, but I haven't found any that are the same.

# The following iterative sequence is defined for the set of positive integers:
#  n = n/2  (n is even)
#    = 3n+1 (n is odd)
# Using the rule above and starting with 13, we generate the following sequence:
#  13,40,20,10,5,16,8,4,2,1
# It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 
#  terms. Although it has not been proved yet (Collatz Problem), it is thought that
#  all starting numbers finish at 1.
# Which starting number, under one million, produces the longest chain?
# Comments: I'm using dynamic programming to avoid computing the same values over and over
#            again.

lim = 1000000
chain = [1,1]
maxL = 0

def collatz(i):
   if i >= len(chain): chain.extend([None]*(i-len(chain)+1))
   if chain[i] == None: chain[i] = 1 + collatz(i/2 if i%2==0 else 3*i+1)
   return chain[i]

for i in range(1, lim): 
   maxL = i if (collatz(i) > chain[maxL]) else maxL 
   print i, chain[maxL]
print "collatz chain of {} is {} terms long.".format(maxL, collatz(maxL))

EDIT: See my working dictionary implementation here: https://stackoverflow.com/a/20229855/1084773 .

To see your memory error, run your code with limit = 100 and then print out chain.

Maybe you want to serialize your recursive code:

lengths = {1: 1}

def collatz(i):
    i0 = i
    acc = 0
    while True:
        if i in lengths:
            lengths[i0] = acc + lengths[i]
            return acc + lengths[i]
        acc += 1
        i = (i * 3 + 1) if i % 2 else i // 2

longest = 1
for i in range(1, 1000000):
    c = collatz(i)
    if c > longest:
        longest = c
        print(i, c)

This surely can still be optimized in many ways, but it yields the expected result in 4 seconds.


EDIT:

Your approach creates a list with a length of the highest term ever reached. For limit = 100, this is 9232. Which is not that much. But for limit = 1000000 it is 56991483520 (the chain starting with 704511), which is quite a whopper. If it were a mere array over int32, this would already be 212 GB of memory, and in reality it is more than that.


Here the troublesome chain: 704511, 2113534, 1056767, 3170302, 1585151, 4755454, 2377727, 7133182, 3566591, 10699774, 5349887, 16049662, 8024831, 24074494, 12037247, 36111742, 18055871, 54167614, 27083807, 81251422, 40625711, 121877134, 60938567, 182815702, 91407851, 274223554, 137111777, 411335332, 205667666, 102833833, 308501500, 154250750, 77125375, 231376126, 115688063, 347064190, 173532095, 520596286, 260298143, 780894430, 390447215, 1171341646, 585670823, 1757012470, 878506235, 2635518706, 1317759353, 3953278060, 1976639030, 988319515, 2964958546, 1482479273, 4447437820, 2223718910, 1111859455, 3335578366, 1667789183, 5003367550, 2501683775, 7505051326, 3752525663, 11257576990, 5628788495, 16886365486, 8443182743, 25329548230, 12664774115, 37994322346, 18997161173, 56991483520 , 28495741760, 14247870880, 7123935440, 3561967720, 1780983860, 890491930, 445245965, 1335737896, 667868948, 333934474, 166967237, 500901712, 250450856, 125225428, 62612714, 31306357, 93919072, 46959536, 23 479768, 11739884, 5869942, 2934971, 8804914, 4402457, 13207372, 6603686, 3301843, 9905530, 4952765, 14858296, 7429148, 3714574, 1857287, 5571862, 2785931, 8357794, 4178897, 12536692, 6268346, 3134173, 9402520, 4701260, 2350630, 1175315, 3525946, 1762973, 5288920, 2644460, 1322230, 661115, 1983346, 991673, 2975020, 1487510, 743755, 2231266, 1115633, 3346900, 1673450, 836725, 2510176, 1255088, 627544, 313772, 156886, 78443, 235330, 117665, 352996, 176498, 88249, 264748, 132374, 66187, 198562, 99281, 297844, 148922, 74461, 223384, 111692, 55846, 27923, 83770, 41885, 125656, 62828, 31414, 15707, 47122, 23561, 70684, 35342, 17671, 53014, 26507, 79522, 39761, 119284, 59642, 29821, 89464, 44732, 22366, 11183, 33550, 16775, 50326, 25163, 75490, 37745, 113236, 56618, 28309, 84928, 42464, 21232, 10616, 5308, 2654, 1327, 3982, 1991, 5974, 2987, 8962, 4481, 13444, 6722, 3361, 10084, 5042, 2521, 7564, 3782, 1891, 5674, 2837, 8512, 4256, 2128, 1064, 532, 266, 133, 400, 200, 100, 50, 25, 76, 38, 19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1


Using your exact recursive idea, but using a dict (which is sparse) instead of a list (runs with no probs):

lengths = {1: 1}

def collatz(i):
    if i in lengths: return lengths [i]
    j = (i * 3 + 1) if i % 2 else i // 2
    c = collatz (j) + 1
    lengths [i] = c
    return c

longest = 1
for i in range(1, 1000000):
    c = collatz(i)
    if c > longest:
        longest = c
        print(i, c)

The list implementation used a horrible amount of extra memory (as pointed out by Hyperboreus, thanks!). Switching to a dictionary reduced the memory usage from ~212GB to ~70MB with 32byte ints and solved the problem in two seconds.

lim = 1000000
chain = {1:1} 
maxL = 1

def collatz(i):
   if i not in chain: chain[i] = 1 + collatz(i/2 if i%2==0 else 3*i+1)
   return chain[i]

for i in range(1, lim): maxL = i if (collatz(i) > chain[maxL]) else maxL 
print "collatz chain of {} is {} terms long.".format(maxL, collatz(maxL))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM