当我在for循环中加载大型pickle文件时，如何防止内存泄漏？

Question

I have 50 pickle files that are 0.5 GB each. 我有50个每个0.5 GB的pickle文件。 Each pickle file is comprised of a list of custom class objects. 每个pickle文件都包含一个自定义类对象列表。 I have no trouble loading the files individually using the following function: 我可以使用以下功能单独加载文件：

def loadPickle(fp):
    with open(fp, 'rb') as fh:
        listOfObj = pickle.load(fh)
    return listOfObj

However, when I try to iteratively load the files I get a memory leak. 但是，当我尝试迭代加载文件时，我得到内存泄漏。

l = ['filepath1', 'filepath2', 'filepath3', 'filepath4']
for fp in l:
    x = loadPickle(fp)
    print( 'loaded {0}'.format(fp) )

My memory overflows before loaded filepath2 is printed. 在loaded filepath2打印之前，我的内存溢出。 How can I write code that guarantees that only a single pickle is loaded during each iteration? 如何编写代码以保证在每次迭代期间只加载一个pickle？

Answers to related questions on SO suggest using objects defined in the weakref module or explicit garbage collection using the gc module, but I am having a difficult time understanding how I would apply these methods to my particular use case. 关于SO的相关问题的答案建议使用weakref模块中定义的对象或使用gc模块显式垃圾收集，但我很难理解如何将这些方法应用于我的特定用例。 This is because I have an insufficient understanding of how referencing works under the hood. 这是因为我对引用如何在引擎下工作的理解不足。

Related: Python garbage collection 相关： Python垃圾收集

Answer 1

You can fix that by adding x = None right after for fp in l: . 您可以通过for fp in l:添加x = None来解决此问题。

The reason this works is because it will dereferenciate variable x , hance allowing the python garbage collector to free some virtual memory before calling loadPickle() the second time. 这个工作的原因是因为它将取消引用变量x ，允许python垃圾收集器在第二次调用loadPickle()之前释放一些虚拟内存。

当我在for循环中加载大型pickle文件时，如何防止内存泄漏？

问题描述

1 个解决方案

解决方案1
8 已采纳 2013-04-29 22:12:54

当我在for循环中加载大型pickle文件时，如何防止内存泄漏？

问题描述

1 个解决方案

解决方案1 8 已采纳 2013-04-29 22:12:54

解决方案1
8 已采纳 2013-04-29 22:12:54