如何在python中处理非常大的文件？

Question

This question was asked earlier, but quite a while ago. 这个问题是在较早之前提出的，但已经有一段时间了。 I am currently trying to open a very large file (20GB) to manipulate stuff. 我目前正在尝试打开一个非常大的文件（20GB）来处理内容。

I am using: 我在用：

read_path = '../text/'
time = 3600
data = open(read_path+'genomes'+str(time)).read().replace(',','\n').replace('\n','')

and it works fine when I choose a smaller file in the same directory (genomes1000), but when I change the time to the one matching the larger file I get the error. 当我在同一目录（genomes1000）中选择一个较小的文件时，它工作正常，但是当我将时间更改为与较大的文件匹配的文件时，我得到了错误。

The exact error message is: 确切的错误消息是：

Tempo:analytics scottjg$ python genomeplot.py 
Traceback (most recent call last):
  File "genomeplot.py", line 27, in <module>
    data = open(read_path+'genomes'+str(time)).read().replace(',','\n').replace('\n','')
OSError: [Errno 22] Invalid argument
Thoughts?

Answer 1

Your code reads the total contents of the file into memory: 您的代码将文件的总内容读入内存：

open(read_path+'genomes'+str(time)).read()

I suspect that you do not have memory available to accomodate this and that is probably the reason for the failure. 我怀疑您没有足够的内存来适应这一点，这可能是失败的原因。 Wouldn't it be better to process it line by line with a call to readline in a loop instead? 而不是通过循环调用readline逐行处理它会更好吗？

如何在python中处理非常大的文件？

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-01-15 16:31:33

如何在python中处理非常大的文件？

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-01-15 16:31:33

解决方案1
2 已采纳 2016-01-15 16:31:33