如何减少python中大列表使用的内存

Question

I'm writing a program, it works fine but when it loads the database(a 100MB text file) to a list it's memory usage becomes 700-800MB 我正在编写一个程序，它可以正常工作，但是当它将数据库（一个100MB的文本文件）加载到列表中时，其内存使用量将变为700-800MB

Code used to load the file to a list: 用于将文件加载到列表的代码：

database = []
db = open('database/db.hdb')
dbcontent = db.read()
for line in dbcontent.split('\n'):
    line = line.split(':')
    database.append(line)

Snipplet from db.hdb: db.hdb中的片段：

14200:917cb8a3d1d9eb24af6c5bcf3bf7e401:Trojan.Downloader-1420
7168:a105e2cc8148158cd048360eb847c7d0:Trojan.Downloader-1421
7168:c61ef67b5e7eef19ef732f55116742f6:Trojan.Downloader-1422
7168:851b6320148122104f50445ea2684c9f:Trojan.Downloader-1423
7168:ca128383c79a56d930eb4a7ff5026e31:Trojan.Downloader-1424
355204:4af89f8d219f94462cf2f8cb8eb4c6d7:Trojan.Bancos-2053
356984:2bfb53d76891059b79122e13d1537e4a:Trojan.Bancos-2054
363520:edbbdf497cda1ba79c06ea40673d963e:Trojan.Bancos-2055
367616:d85f719b032dbf39800d90ca881fd225:Trojan.Bancos-2056
370688:6cb572fd2452416dc4ea09e3ad917e66:Trojan.Bancos-2057
370688:ef34885677230061649d30ea66d7b0a1:Trojan.Bancos-2058
399360:8578b664706cfdc2f653680bac1b1b6e:Trojan.Bancos-2059
401408:de62af250b5a3e1ba1e9c517629383dd:Trojan.Bancos-2060
622592:8a236340c0a8c76343f6fb581314fadf:Trojan.Bancos-2061
622592:29f3499488ba1814c62fac3c2f3bda54:Trojan.Bancos-2062
622592:5d023bccf2ff097ccbc0ab0eab4a6ee7:Trojan.Bancos-2063
622592:3d6a25ed1f0e2001e72812ce1adf37d3:Trojan.Bancos-2064
622592:eaff242b601807e5805c189752d39124:Trojan.Bancos-2065
623104:8cd8e788d33cf40412d3346a525e4cce:Trojan.Bancos-2066
625152:25470d6895cb0e5c2e7181cb9a201ae0:Trojan.Bancos-2067
625152:436d574cef37b2e62d9b801b8fc2c4f1:Trojan.Bancos-2068
647168:51eb4e43f24cf511e6715cc8667babcd:Trojan.Bancos-2069

(The full file has ~1800000 lines) （完整文件有〜1800000行）

How do I decrease the memory usage 如何减少内存使用量

Answer 1

You should use the file object as an iterator to reduce memory usage from file. 您应该将文件对象用作迭代器，以减少文件的内存使用量。 You could then process database list in chunks rather than all together. 然后，您可以分块而不是全部处理数据库列表。 For example: 例如：

results = []
database = []
for line in open("database/db.hdb"):
    line = line.split(':')
    #You could then manage database in chunks?
    database.append(line)
    if len(database) > MAX:
        #dosomething with database list so far to get result
        results.append(process_database(database))
        database = []
#do something now with individual results to make one result
combine_results(results)

Answer 2

As long as you don't need the complete file in memory, you could read one line at a time: 只要您不需要内存中的完整文件，就可以一次读取一行：

database = []
db = open('database/db.hdb')
line = db.readline()
while line:
    line = line.split(':')
    database.append(line)
    line = db.readline()

See here for details on file.readline() 有关file.readline()详细信息，请参见此处

如何减少python中大列表使用的内存

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-02-23 13:25:20

解决方案2
-1 2013-02-23 13:20:56

如何减少python中大列表使用的内存

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-02-23 13:25:20

解决方案2 -1 2013-02-23 13:20:56

解决方案1
2 已采纳 2013-02-23 13:25:20

解决方案2
-1 2013-02-23 13:20:56