繁体   English   中英

遍历 CSV 泄漏 memory

[英]Iterating through CSV leaking memory

I have this function that takes a CSV file on the disk, opens it using a csv.DictReader object, deletes three keys, deletes the file from disk and then returns the data:

def process_data(self, filename):
    results = []
    with open(filename) as f:
        data = csv.DictReader(f)
        for row in data:
            del row['foo']
            del row['bar']
            del row['spot']
            results.append(row)
    os.remove(filename)
    return results

我按顺序(在同一进程中)调用此 function 60 次以处理 60 个 CSV 文件。 进程中大约有 20 个文件崩溃而没有错误,经过一些调查,我发现每次调用此 function 时,memory 的使用量都会增加,直到 memory 结束进程。

我使用tracemalloc显示分配最多 memory 的文件,似乎每次调用process_data时,csv 库分配了大约 800MB 的 ZCD69B4957F06CD8218D7BF3D61980E

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:10]:
     print(stat)

输出为前两行(以下 8 不分配超过 2MB)

/usr/local/lib/python3.6/csv.py:120: size=619 MiB, count=7578695, average=86 B
/usr/local/lib/python3.6/csv.py:112: size=214 MiB, count=3404479, average=66 B

查看csv.py这两行是DictReader class 的一部分。 第 112 行将读取器递增到下一行,第 120 行根据数据创建OrderedDict

在每个 function 调用之后,我遍历globals()locals()返回的每个 object 以检查大小,似乎没有任何大对象,也没有意外的大量对象。

我无法确定为什么这会导致我的应用程序中出现 memory 泄漏。 My expectation is that the DictReader object should be cleared once the file is closed (which should happen as the function ends as I'm using with open ) but it seems that memory is being used and never cleared while getting the data from the CSV.

One thing to keep in mind is that python doesn't gives back memory to os even when the code isn't referencing the particular memory anymore So i suggest you to add gc.collect() After each function call to process_data()

不熟悉csv.DictReader,但是如果它读取memory中的整个文件,这会将stream的数据变成一个列表。 如果 csv 变大,memory 仍然很消耗。

def process_data(self, filename):
    with open(filename) as f:
        csv_sep = ','
        header = f.readline().strip()
        results = []
        for line in f:
            data = dict(zip(header.split(csv_sep), line.strip().split(csv_sep)))
            data.pop('foo')
            data.pop('bar')
            data.pop('spot')
            results.append(data)
    os.remove(filename)
    return results

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM