简体   繁体   English

使用mysql-python时,内存泄漏大数据集

[英]Memory leak with large dataset when using mysql-python

I am experiencing what I believe is a memory leak when using the MySQLdb API 我在使用MySQLdb API时遇到了我认为是内存泄漏的问题

Line #    Mem usage    Increment   Line Contents
================================================
     6                             @profile
     7    10.102 MB     0.000 MB   def main():
     8    10.105 MB     0.004 MB       connection = MySQLdb.connect(host="localhost", db="mydb",
     9    11.285 MB     1.180 MB                                    user="notroot", passwd="Admin123", use_unicode=True)
    10    11.285 MB     0.000 MB       cursor = connection.cursor(cursorclass=MySQLdb.cursors.SSCursor)
    11                                 
    12    11.289 MB     0.004 MB       cursor.execute("select * from a big table;")
    13                                 
    14   254.078 MB   242.789 MB       results = [result for result in cursor]
    15   251.672 MB    -2.406 MB       del results
    16   251.672 MB     0.000 MB       return

Also when exploring the heap with guppy / hpy it shows that most of my memory is occupied by unicode objects, ints and datetime objects (very likely to be to rows return by the MySQLdb API). 另外随着探索堆时, guppy / hpy这表明,大多数我的记忆是由Unicode对象,整数和日期时间对象(很可能是行由MySQLdb的API返回)所占据。

I'm using Python 2.7.3, mysql-python==1.2.4 on Ubuntu 12.04 and profiled with memory_profiler . 我在Ubuntu 12.04上使用Python 2.7.3, mysql-python==1.2.4 ,并使用memory_profiler进行了memory_profiler

Could this be interning as described in http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm ? 这可能是http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm中描述的实习生吗?

Am I missing any references dangling around? 我错过了悬挂的任何引用吗?

EDIT: I also closed the cursor and connection but still got similar results. 编辑:我也关闭了光标和连接,但仍然得到了类似的结果。

SOLVED: Facepalm. 解决: Facepalm。 I was doing a list comprehension with naturally kept everything in memory. 我正在做一个列表理解,自然地将所有内容保存在内存中。 When consuming the iterator properly (streaming to a file or something) it has decent memory usage. 正确使用迭代器(流式传输到文件或其他东西)时,它具有不错的内存使用率。

Line #    Mem usage    Increment   Line Contents
================================================
    16                             @profile
    17    10.055 MB     0.000 MB   def main():
    18    10.059 MB     0.004 MB       connection = MySQLdb.connect(host="localhost", db="mydb",
    19    11.242 MB     1.184 MB                                    user="notroot", passwd="Admin123", use_unicode=True)
    20    11.242 MB     0.000 MB       cursor = connection.cursor(cursorclass=MySQLdb.cursors.SSCursor)
    21                                 
    22    11.246 MB     0.004 MB       cursor.execute("select * from big table")
    23    11.246 MB     0.000 MB       count = 0
    24    30.887 MB    19.641 MB       for result in cursor:
    25    30.887 MB     0.000 MB           count = count + 1
    26    30.895 MB     0.008 MB       cursor.close()
    27    30.898 MB     0.004 MB       connection.close()
    28    30.898 MB     0.000 MB       return

Solved by the OP. 由OP解决。 His original code contained the line 他的原始代码包含该行

results = [result for result in cursor]

This list comprehension stored the entire result in memory, rather than streaming it from the server as needed. 此列表理解将整个结果存储在内存中,而不是根据需要从服务器中流式传输。 The OP replaced it with a simple OP用一个简单的替换它

for result in cursor:
    ...

and saw his memory usage go back to normal. 并看到他的记忆力恢复正常。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM