[英]Python list of Objects taking up too much memory
I have the following code, that creates a million objects of a class foo: 我有以下代码,它创建了类foo的一百万个对象:
for i in range(1000000):
bar = foo()
list_bar.append(bar)
The bar object is only 96 bytes, as determined by getsizeof()
. bar对象只有96个字节,由getsizeof()
确定。 However, the append step takes almost 8GB of ram. 然而,追加步骤需要几乎8GB的内存。 Once the code exits the loop, the ram usage drops to expected amounts (size of the list + some overhead ~103MB). 一旦代码退出循环,ram使用量就会下降到预期的数量(列表的大小+一些开销~103MB)。 Only while the loop is running does the ram usage skyrocket. 只有在循环运行时,ram使用才会飙升。 Why does this happen? 为什么会这样? Any workarounds? 任何解决方法? PS: Using a generator is not an option, it has to be a list. PS:使用生成器不是一个选项,它必须是一个列表。
EDIT: xrange
doesn't help, using Python 3. The memory usage stays high only during the loop execution, and drops after the loop is through. 编辑: xrange
没有帮助,使用Python 3.内存使用率仅在循环执行期间保持高水平,并在循环结束后下降。 Could append
have some non-obvious overhead? 可以append
有一些非明显的开销吗?
Most probably this is due to some unintended cyclical references made by the foo()
constructor; 很可能这是由于foo()
构造函数做出的一些无意的循环引用; as normally Python objects will release memory instantly when the reference count drops to zero; 因为通常Python对象会在引用计数降至零时立即释放内存; now these would be freed later when the garbage collector gets a chance to run. 现在,当垃圾收集器有机会运行时,这些将被释放。
You can try to force the GC run after say 10000 iterations to see if it keeps the memory usage constant. 您可以尝试在10000次迭代后强制运行GC以查看它是否保持内存使用量不变。
import gc
n = 1000000
list_bar = [ None ] * n
for i in range(n):
list_bar[i] = foo()
if i % 10000 == 0:
gc.collect()
If this relieves memory pressure then the memory usage is because of some reference cycles. 如果这样可以减轻内存压力,那么内存使用量就是因为某些参考周期。
The resizing of a list has some overhead. 调整列表的大小有一些开销。 If you know how many elements, then you can create the list beforehand, eg: 如果您知道有多少元素,那么您可以事先创建列表,例如:
list_bar = [ foo() for _ in xrange(1000000) ]
should know the size of the array and not need to resize it; 应该知道数组的大小而不需要调整它的大小; or create the list filled with None
: 或创建填充None
的列表:
n = 1000000
list_bar = [ None ] * n
for i in range(n):
list_bar[i] = foo()
append
should be using realloc
to grow the list, but old memory ought to be released as soon as possible; append
应该使用realloc
来扩展列表,但旧的内存应该尽快释放; and all in all the overhead of all memory allocated should not sum to 8G for a list that is 100 MB at the end; 并且所有分配的所有内存的开销都不应该总和到8G,最后是100 MB的列表; it can be possible that the operating system is miscalculating the memory used. 操作系统可能错误地计算了所使用的内存。
How are you measuring the memory usage? 你是如何测量内存使用量的?
I suspect your usage of a 3rd party module might be the cause. 我怀疑你使用第三方模块可能是原因。 Perhaps the 3rd party module is temporarily using a lot of memory when initialised. 也许第三方模块在初始化时暂时使用大量内存。
Besides, sys.getsizeof()
is not an accurate indication of the memory used by an object. 此外, sys.getsizeof()
不是对象使用的内存的准确指示。
For example: 例如:
from sys import getsizeof
class A(object):
pass
class B(object):
def __init__(self):
self.big = 'a' * 1024*1024*1024 # approx. 1 GiB
>>> getsizeof(A)
976
>>> a = A()
>>> getsizeof(a)
64
>>>
>>> getsizeof(B)
976
>>> b = B()
>>> getsizeof(b)
64
>>> getsizeof(b.big)
1073741873
After instantiating b = B()
, top
reports approx 1GiB resident memory usage. 在实例化b = B()
, top
报告大约1GiB驻留内存使用量。 Obviously this is not reflected by getsizeof(b)
which returns only 64 bytes. 显然,这不会被getsizeof(b)
反映出来, getsizeof(b)
返回64个字节。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.