简体   繁体   English

为什么将numpy数组与列表进行比较会消耗大量内存?

[英]Why does comparison of a numpy array with a list consume so much memory?

This bit stung me recently. 最近这有点刺痛我。 I solved it by removing all comparisons of numpy arrays with lists from the code. 我通过从代码中删除所有numpy数组与列表的比较来解决它。 But why does the garbage collector miss to collect it? 但是为什么垃圾收集器会错过收集呢?

Run this and watch it eat your memory: 运行此命令并观看它会消耗您的内存:

import numpy as np
r = np.random.rand(2)   
l = []
while True:
    r == l

Running on 64bit Ubuntu 10.04, virtualenv 1.7.2, Python 2.7.3, Numpy 1.6.2 在64位Ubuntu 10.04,virtualenv 1.7.2,Python 2.7.3,Numpy 1.6.2上运行

Just in case someone stumbles on this and wonders... 以防万一有人偶然发现并想知道...

@Dugal yes, I believe this is a memory leak in current numpy versions (Sept. 2012) that occurs when some Exceptions are raised (see this and this ). @Dugal是的,我相信这是当前numpy版本(2012年9月)中的内存泄漏,当引发某些异常时会发生此泄漏(请参阅thisthis )。 Why adding the gc call that @BiRico did "fixes" it seems weird to me, though it must be done right after appearently? 为什么添加@BiRico可以“修复”的gc调用对我来说似乎很奇怪,尽管它必须在出现后立即完成? Maybe its an oddity with how python garbage collects tracebacks, if someone knows the Exception handling and garbage colleciton CPython Internals, I would be interested. 也许它与python垃圾收集回溯的方式很奇怪,如果有人知道异常处理和垃圾收集CPython Internals,我将很感兴趣。

Workaround : This is not directly related to lists, but for example most broadcasting Exceptions (the empty list does not fit to the arrays size, an empty array results in the same leak. Note that internally there is an Exception prepared that never surfaces). 解决方法 :这与列表没有直接关系,但是例如大多数广播异常(空列表不适合数组大小,空数组会导致相同的泄漏。请注意,内部准备了一个永不浮出水面的异常)。 So as a workaround, you should probably just check first if the shape is correct (if you do it a lot, otherwise I wouldn't worry really, this leaks just a small string if I got it right). 因此,作为一种解决方法,您可能应该首先检查形状是否正确(如果您做了很多,否则我不会担心,如果我正确的话,这只会泄漏一小串)。

FIXED: This issue will be fixed with numpy 1.7. 已修复: numpy 1.7将解决此问题。

Sorry I cannot give a more complete answer, but this seems to have something to do with garbage collection. 抱歉,我无法给出更完整的答案,但这似乎与垃圾回收有关。 I was able to recreate this issue using python 2.7.2, numpy 1.6.1 on Redhat 5.8. 我能够在Redhat 5.8上使用python 2.7.2,numpy 1.6.1重新创建此问题。 However when I tried the following, memory usage went back to normal. 但是,当我尝试以下操作时,内存使用率恢复正常。

import gc
import numpy as np
r = np.random.rand(2)   
l = []
while True:
    r == l
    gc.collect()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM