简体   繁体   English

如何将数据读入Python但不能全部读入内存?

[英]How do I read data into Python but not entirely into memory?

I need to parse through a file of about 100,000 records. 我需要解析大约100,000条记录的文件。 Is there a way to do this without loading the whole file into memory? 有没有一种方法可以在不将整个文件加载到内存的情况下执行此操作? Does the csv module already do this (ie, not load the entire file into memory)? csv模块是否已执行此操作(即未将整个文件加载到内存中)? If it matters, I plan on doing this in IDLE. 如果有问题,我计划在IDLE中进行。

I've never used the cvs module, but you'll want to look into using a generator, this will allow you to process a record at a time without reading the entire file in at once. 我从未使用过cvs模块,但是您将要研究使用生成器,这将使您可以一次处理一条记录,而无需一次读取整个文件。 For example, with a file, you can do something like... 例如,使用文件,您可以执行以下操作:

def read_file(some_file):
    for line in open(some_file):
        yield line

all_lines = read_file("foo")
results = process(all_lines)

The all_lines will be a generator and will return one line each time it is referenced, as in: all_lines将是一个生成器,并且每次被引用时都将返回一行,如下所示:

for line in all_lines:
    ...

I'd imagine you can do this with the cvs module as well. 我想您也可以使用cvs模块来做到这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM