有办法避免这种内存错误吗？

Question

I'm currently working through the problems on Project Euler, and so far I've come up with this code for a problem. 我目前正在研究Euler项目上的问题，到目前为止，我已经提出了解决此问题的代码。

from itertools import combinations
import time

def findanums(n):
    l = []
    for i in range(1, n + 1):
        s = []
        for j in range(1, i):
            if i % j == 0:
                s.append(j)
        if sum(s) > i:
            l.append(i)
    return l

start = time.time() #start time

limit = 28123

anums = findanums(limit + 1) #abundant numbers (1..limit)
print "done finding abundants", time.time() - start

pairs = combinations(anums, 2)
print "done finding combinations", time.time() - start

sums = map(lambda x: x[0]+x[1], pairs)
print "done finding all possible sums", time.time() - start

print "start main loop"
answer = 0
for i in range(1,limit+1):
    if i not in sums:
        answer += i
print "ANSWER:",answer

When I run this I run into a MemoryError . 当我运行它时，我遇到了MemoryError 。

The traceback: 追溯：

File "test.py", line 20, in <module>
    sums = map(lambda x: x[0]+x[1], pairs)

I've tried to prevent it by disabling garbage collection from what I've been able to get from Google but to no avail. 我试图通过禁用我无法从Google获得的垃圾收集来阻止垃圾收集，但无济于事。 Am I approaching this the wrong way? 我是否以错误的方式处理此问题？ In my head this feels like the most natural way to do it and I'm at a loss at this point. 在我的脑海中，这感觉是最自然的方式，而我对此感到茫然。

SIDE NOTE: I'm using PyPy 2.0 Beta2(Python 2.7.4) because it is so much faster than any other python implementation I've used, and Scipy/Numpy are over my head as I'm still just beginning to program and I don't have an engineering or strong math background. 旁注：我使用的是PyPy 2.0 Beta2（Python 2.7.4），因为它比我使用的任何其他python实现都要快得多，而Scipy / Numpy却让我望而却步，因为我仍然才刚刚开始编程并我没有工程学或扎实的数学背景。

Answer 1

As Kevin mention in the comments, your algorithm might be wrong, but anyway your code is not optimized. 正如Kevin在评论中提到的那样，您的算法可能是错误的，但是无论如何您的代码都没有得到优化。

When using very big lists, it is common to use generators , there is a famous, great Stackoverflow answer explaining the concepts of yield and generator - What does the "yield" keyword do in Python? 当使用非常大的列表时，通常使用generators ，有一个著名的，很棒的Stackoverflow答案解释了yield和generator的概念-Python 中的“ yield”关键字是做什么的？

The problem is that your pairs = combinations(anums, 2) doesn't generate a generator , but generates a large object which is much more memory consuming. 问题是您的pairs = combinations(anums, 2)不会生成生成generator ，而是生成大对象，这会消耗更多的内存。

I changed your code to have this function, since you iterating over the collection only once, you can lazy evaluate the values: 我将您的代码更改为具有此功能，因为您仅对集合进行了一次迭代，所以您可以延迟计算值：

def generator_sol(anums1, s):
      for comb in itertools.combinations(anums1, s):
        yield comb

Now instead of pairs = combinations(anums, 2) which generates a large object. 现在，不是pairs = combinations(anums, 2) ，而是生成一个大对象。 Use: 使用：

pairs = generator_sol(anums, 2)

Then, instead of using the lambda I would use another generator : 然后，代替使用lambda我将使用另一个generator ：

sums_sol = (x[0]+x[1] for x in pairs)

Another tip, you better look at xrange which is more suitable, it doens't generate a list but an xrange object which is more efficient in many cases (such as here). 另一个技巧是，您最好看一下更合适的xrange ，它不会生成列表，而是会生成在许多情况下（例如此处）更有效的xrange object 。

Answer 2

Let me suggest you to use generators . 让我建议您使用生成器。 Try changing this: 尝试更改此：

sums = map(lambda x: x[0]+x[1], pairs)

to 至

sums = (a+b for (a,b) in pairs)

Ofiris solution is also ok but implies that itertools.combinations return a list when it's wrong. Ofiris解决方案也可以，但是暗示itertools.combinations在错误时会返回一个列表。 If you are going to keep solving project euler problems have a look at the itertools documentation . 如果您要继续解决项目欧拉问题，请查看itertools文档。

Answer 3

The issue is that anums is big - about 28000 elements long. 问题是原子量很大-大约28000个元素长。 so pairs must be 28000*28000*8 bytes = 6GB. 因此，对必须为28000 * 28000 * 8字节= 6GB。 If you used numpy you could cast anums as a numpy.int16 array, in which case the result pairs would be 1.5GB - more manageable: 如果使用numpy，则可以将anums转换为numpy.int16数组，在这种情况下，结果对将为1.5GB-更易于管理：

import numpy as np
#cast
anums = np.array([anums],dtype=np.int16)
#compute the sum of all the pairs via outer product
pairs = (anums + anums.T).ravel()

有办法避免这种内存错误吗？

问题描述

3 个解决方案

解决方案1
4 已采纳 2013-04-18 20:30:06

解决方案2
2 2013-04-18 20:40:21

解决方案3
1 2013-04-18 20:37:02

有办法避免这种内存错误吗？

问题描述

3 个解决方案

解决方案1 4 已采纳 2013-04-18 20:30:06

解决方案2 2 2013-04-18 20:40:21

解决方案3 1 2013-04-18 20:37:02

解决方案1
4 已采纳 2013-04-18 20:30:06

解决方案2
2 2013-04-18 20:40:21

解决方案3
1 2013-04-18 20:37:02