Python内存管理问题

Question

我有一个关于我拥有的特定python代码的内存管理的问题。 这是代码

def combo_counter(file_path,title_body,validation=None,val_set=None,val_number=None):
    combo_count={}

    counter=0

    with open(file_path+"/Train.csv") as r:
        reader=csv.reader(r)
        next(r)
        if title_body=='body':
            for row in reader:
                if (validation is not None) and ((int(row[0])>val_set[0]) and (int(row[0])<val_set[-1])):
                    continue


                counter+=1
                if counter%10000==0:
                    print counter

                no_stops=body_parser(row)

                a=' '.join(no_stops)
                b=row[3]
                for x, y in product(a.split(), b.split()):
                    if x+" "+y in combo_count:
                        combo_count[x+" "+y]+=1
                    else:
                        combo_count[x+" "+y]=1
    return combo_count

def body_parser(row):
    soup=BS(row[2],'html')
    for tag in soup.findAll(True):
        if tag.name in bad_tags:
            tag.extract()
    code_removed=soup.renderContents()
    tags_removed=re.sub(r'<[^>]+>', '', code_removed)
    parse_punct=re.findall(r"[\w+#]+(?:[-'][\w+#]+)*|'|[-.(]+|\S[\w+#]*",tags_removed)
    no_punct=' '.join(w.lower() for w in parse_punct if w not in string.punctuation)
    no_stops=[b for b in no_punct.split(' ') if not b in stops]

    return no_stops

因此，基本上我是逐行读取一个csv文件并解析每一行，然后使用名为combo_count的字典对同现进行计数。 问题在于字典一旦导出，就只有1.2GB左右，但是当我运行此代码时，它使用的内存要比这多得多。 但是我唯一能看到会消耗大量内存的是字典。 我怀疑某事正在耗尽本不应该的内存。 处理完每一行后，应该从内存中删除除计数字典以外的所有内容。 除了字典之外，没有人能看到代码中会耗尽内存的任何内容吗？ 我怀疑它在body_parser函数中。

Answer 1

@用户

您可以使用python的memory_profiler来检查哪个变量正在使用更多内存，并且从不释放它。

此加载项提供了装饰器@profile，它可以监视一个特定功能的内存使用情况。 使用非常简单。

import copy
import memory_profiler

@profile
def function():
    x = list(range(1000000))  # allocate a big list
    y = copy.deepcopy(x)
    del x
    return y

if __name__ == "__main__":
    function()

调用它：

python -m memory_profiler memory-profile-me.py

这将输出类似于以下内容的输出：

Line #    Mem usage    Increment   Line Contents
================================================
     4                             @profile
     5      9.11 MB      0.00 MB   def function():
     6     40.05 MB     30.94 MB       x = list(range(1000000)) # allocate a big list
     7     89.73 MB     49.68 MB       y = copy.deepcopy(x)
     8     82.10 MB     -7.63 MB       del x
     9     82.10 MB      0.00 MB       return y

甚至，在http://deeplearning.net/software/theano/tutorial/python-memory-management.html上也提供了相同的详细说明。

Python内存管理问题

问题描述

1 个解决方案

解决方案1
0 2016-04-11 04:44:33

Python内存管理问题

问题描述

1 个解决方案

解决方案1 0 2016-04-11 04:44:33

解决方案1
0 2016-04-11 04:44:33