对象的Python列表占用了太多内存

Question

I have the following code, that creates a million objects of a class foo: 我有以下代码，它创建了类foo的一百万个对象：

for i in range(1000000):
    bar = foo()
    list_bar.append(bar)

The bar object is only 96 bytes, as determined by getsizeof() . bar对象只有96个字节，由getsizeof()确定。 However, the append step takes almost 8GB of ram. 然而，追加步骤需要几乎8GB的内存。 Once the code exits the loop, the ram usage drops to expected amounts (size of the list + some overhead ~103MB). 一旦代码退出循环，ram使用量就会下降到预期的数量（列表的大小+一些开销~103MB）。 Only while the loop is running does the ram usage skyrocket. 只有在循环运行时，ram使用才会飙升。 Why does this happen? 为什么会这样？ Any workarounds? 任何解决方法？ PS: Using a generator is not an option, it has to be a list. PS：使用生成器不是一个选项，它必须是一个列表。

EDIT: xrange doesn't help, using Python 3. The memory usage stays high only during the loop execution, and drops after the loop is through. 编辑： xrange没有帮助，使用Python 3.内存使用率仅在循环执行期间保持高水平，并在循环结束后下降。 Could append have some non-obvious overhead? 可以append有一些非明显的开销吗？

Answer 1

Most probably this is due to some unintended cyclical references made by the foo() constructor; 很可能这是由于foo()构造函数做出的一些无意的循环引用; as normally Python objects will release memory instantly when the reference count drops to zero; 因为通常Python对象会在引用计数降至零时立即释放内存; now these would be freed later when the garbage collector gets a chance to run. 现在，当垃圾收集器有机会运行时，这些将被释放。

You can try to force the GC run after say 10000 iterations to see if it keeps the memory usage constant. 您可以尝试在10000次迭代后强制运行GC以查看它是否保持内存使用量不变。

import gc
n = 1000000
list_bar = [ None ] * n
for i in range(n):
    list_bar[i] = foo()
    if i % 10000 == 0:
        gc.collect()

If this relieves memory pressure then the memory usage is because of some reference cycles. 如果这样可以减轻内存压力，那么内存使用量就是因为某些参考周期。

The resizing of a list has some overhead. 调整列表的大小有一些开销。 If you know how many elements, then you can create the list beforehand, eg: 如果您知道有多少元素，那么您可以事先创建列表，例如：

list_bar = [ foo() for _ in xrange(1000000) ]

should know the size of the array and not need to resize it; 应该知道数组的大小而不需要调整它的大小; or create the list filled with None : 或创建填充None的列表：

n = 1000000
list_bar = [ None ] * n
for i in range(n):
    list_bar[i] = foo()

append should be using realloc to grow the list, but old memory ought to be released as soon as possible; append应该使用realloc来扩展列表，但旧的内存应该尽快释放; and all in all the overhead of all memory allocated should not sum to 8G for a list that is 100 MB at the end; 并且所有分配的所有内存的开销都不应该总和到8G，最后是100 MB的列表; it can be possible that the operating system is miscalculating the memory used. 操作系统可能错误地计算了所使用的内存。

Answer 2

How are you measuring the memory usage? 你是如何测量内存使用量的？

I suspect your usage of a 3rd party module might be the cause. 我怀疑你使用第三方模块可能是原因。 Perhaps the 3rd party module is temporarily using a lot of memory when initialised. 也许第三方模块在初始化时暂时使用大量内存。

Besides, sys.getsizeof() is not an accurate indication of the memory used by an object. 此外， sys.getsizeof()不是对象使用的内存的准确指示。

For example: 例如：

from sys import getsizeof

class A(object):
    pass

class B(object):
    def __init__(self):
        self.big = 'a' * 1024*1024*1024    # approx. 1 GiB

>>> getsizeof(A)
976
>>> a = A()
>>> getsizeof(a)
64
>>> 
>>> getsizeof(B)
976
>>> b = B()
>>> getsizeof(b)
64
>>> getsizeof(b.big)
1073741873

After instantiating b = B() , top reports approx 1GiB resident memory usage. 在实例化b = B() ， top报告大约1GiB驻留内存使用量。 Obviously this is not reflected by getsizeof(b) which returns only 64 bytes. 显然，这不会被getsizeof(b)反映出来， getsizeof(b)返回64个字节。

对象的Python列表占用了太多内存

问题描述

2 个解决方案

解决方案1
5 已采纳 2015-03-11 12:08:20

解决方案2
0 2015-03-11 12:51:06

对象的Python列表占用了太多内存

问题描述

2 个解决方案

解决方案1 5 已采纳 2015-03-11 12:08:20

解决方案2 0 2015-03-11 12:51:06

解决方案1
5 已采纳 2015-03-11 12:08:20

解决方案2
0 2015-03-11 12:51:06