简体   繁体   中英

python analysing two large files simultaneously line by line

I'm trying to analyse two ±6 gb files. I need to analyse them simultaneously, because I need two lines at the same time (one from each file). I tried to do something like this:

with open(fileOne, "r") as First_file:
    for index, line in enumerate(First_file):

        # Do some stuff here

    with open(fileTwo, "r") as Second_file:
        for index, line in enumerate(Second_file):

            # Do stuff here aswell

The problem is that in the second "with open" loop starts at the beginning of the file. So the time is takes to do the analysis will take way to long. I also tried this:

with open(fileOne, "r") as f1, open(fileTwo, "r") as f2:
    for index, (line_R1, line_R2) in enumerate(zip(f1, f2)):

The problem is that both files are loaded directly into the memory. I need the same line from each file. The correct line is:

number_line%4 == 1

This will give line 2, 5, 9, 13 ect. I need those lines from both files.

Is there a faster way and more memory-efficient way to do this?

In Python 2, use itertools.izip() to prevent the files being loaded into memory:

from itertools import izip

with open(fileOne, "r") as f1, open(fileTwo, "r") as f2:
    for index, (line_R1, line_R2) in enumerate(izip(f1, f2)):

The built-in zip() function will indeed read both file objects into memory in their entirety, izip() retrieves lines one at a time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM