在Python 2.7中读取大文件会占用太多内存

Question

I used .readline() to parse file line by line, because I need to find out the start position to extract data into a list, and the end point to pause extracting, then repeat until the end of file. 我使用.readline（）逐行解析文件，因为我需要找出将数据提取到列表中的开始位置，以及要暂停提取的结束点，然后重复直到文件结束。 My file to read is formatted like this: 我要读取的文件格式如下：

blabla...
useless....
...
/sign/
data block(e.g. 10 cols x 1000 rows) 
... blank line 
/sign/    
data block(e.g. 10 cols x 1000 rows)
... blank line 
... 
EOF

let's call this file 'myfile' and my python snippet: 让我们将此文件称为“ myfile”和我的python代码段：

f=open('myfile','r')
blocknum=0 #number the data block
data=[]
while True:
       # find the extract begnning
       while not f.readline().startswith('/sign/'):pass
       # creat multidimensional list to store data block
       data=append([])
       blocknum +=1
       line=f.readline()

       while line.strip():
       # check if the line is a blank line, i.e the end of one block
               data[blocknum-1].append(["2.6E" %float(x) for x in line.split()])
               line = f.readline()
       print "Read Block %d" %blocknum
       if not f.readline(): break

The running result was that read a 500M file consume almost 2GB RAM, I cannot figure it out, somebody help! 运行结果是，读取500M文件会消耗将近2GB的RAM，我无法弄清楚，有人帮忙！ Thanks very much! 非常感谢！

Answer 1

You have quite a lot of non-pythonic ambiguous lines in your code. 您的代码中有很多非Python歧义的行。 I am not sure but think that you can modify your code the following way first and then check it again against memory usage: 我不确定，但是认为您可以先按照以下方式修改代码，然后根据内存使用情况再次检查代码：

data=[]

with open('myfile','r') as f:
    for line in f:
       # find the extract beginning - think you can add here more parameters to check
       if not line.strip() or line.startswith('/sign/'):
           continue
       data.append(["%2.6E" % float(x) for x in line.strip().split()])

But I think that this code will also use quite a lot of memory - however if you don't really need to store all the read data from file you can modify code to use generator expression and proceed file data line by line - this would save your memory i guess. 但是我认为该代码还将占用大量内存-但是，如果您真的不需要存储从文件中读取的所有数据，则可以修改代码以使用生成器表达式并逐行处理文件数据-这样可以节省我想你的记忆。

在Python 2.7中读取大文件会占用太多内存

问题描述

1 个解决方案

解决方案1
0 2011-08-02 06:37:46

在Python 2.7中读取大文件会占用太多内存

问题描述

1 个解决方案

解决方案1 0 2011-08-02 06:37:46

解决方案1
0 2011-08-02 06:37:46