简体   繁体   中英

Python Read Large Text Files

I was trying to compare two large text files line by line (10GB each) without loading the entire files into memory. I used the following code as indicated in other threads :

with open(in_file1,"r") as f1, open(in_file2,"r") as f2:
    for (line1, line2) in zip(f1, f2):
        compare(line1, line2)

But it seems that python fails to read the file line by line. I observed the memory usage while running the code is > 20G. I also tried using:

import fileinput
for (line1, line2) in zip(fileinput.input([in_file1]),fileinput.input([in_file2])):
    compare(line1, line2)

This one also tries to load everything into memory. I'm using Python 2.7.4 on Centos 5.9, and I didn't store any of the lines in my code.

What was going wrong in my code? How should I change it to avoid loading everything into RAM?

Python's zip function returns a list of tuples. So if fetches the complete files to build this list. Use itertools.izip instead. It will return an iterator of tuples.

with open(in_file1,"r") as f1, open(in_file2,"r") as f2:
    for (line1, line2) in izip(f1, f2):
        compare(line1, line2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM