简体   繁体   English

当我在for循环中加载大型pickle文件时,如何防止内存泄漏?

[英]How do I prevent memory leak when I load large pickle files in a for loop?

I have 50 pickle files that are 0.5 GB each. 我有50个每个0.5 GB的pickle文件。 Each pickle file is comprised of a list of custom class objects. 每个pickle文件都包含一个自定义类对象列表。 I have no trouble loading the files individually using the following function: 我可以使用以下功能单独加载文件:

def loadPickle(fp):
    with open(fp, 'rb') as fh:
        listOfObj = pickle.load(fh)
    return listOfObj

However, when I try to iteratively load the files I get a memory leak. 但是,当我尝试迭代加载文件时,我得到内存泄漏。

l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
    x = loadPickle(fp)
    print( 'loaded {0}'.format(fp) )

My memory overflows before loaded filepath2 is printed. loaded filepath2打印之前,我的内存溢出。 How can I write code that guarantees that only a single pickle is loaded during each iteration? 如何编写代码以保证在每次迭代期间只加载一个pickle?

Answers to related questions on SO suggest using objects defined in the weakref module or explicit garbage collection using the gc module, but I am having a difficult time understanding how I would apply these methods to my particular use case. 关于SO的相关问题的答案建议使用weakref模块中定义的对象或使用gc模块显式垃圾收集,但我很难理解如何将这些方法应用于我的特定用例。 This is because I have an insufficient understanding of how referencing works under the hood. 这是因为我对引用如何在引擎下工作的理解不足。

Related: Python garbage collection 相关: Python垃圾收集

You can fix that by adding x = None right after for fp in l: . 您可以通过for fp in l:添加x = None来解决此问题。

The reason this works is because it will dereferenciate variable x , hance allowing the python garbage collector to free some virtual memory before calling loadPickle() the second time. 这个工作的原因是因为它将取消引用变量x ,允许python垃圾收集器在第二次调用loadPickle()之前释放一些虚拟内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM