简体   繁体   English

同时逐行读取两个文本文件

[英]Reading two text files line by line simultaneously

I have two text files in two different languages and they are aligned line by line.我有两个使用两种不同语言的文本文件,它们逐行对齐。 Ie the first line in textfile1 corresponds to the first line in textfile2, and so on and so forth.即 textfile1 中的第一行对应于 textfile2 中的第一行,依此类推。

Is there a way to read both file line-by-line simultaneously?有没有办法同时逐行读取两个文件?

Below is a sample of how the files should look like, imagine the number of lines per file is around 1,000,000.下面是文件应该是什么样子的示例,假设每个文件的行数大约为 1,000,000。

textfile1:文本文件 1:

This is a the first line in English
This is a the 2nd line in English
This is a the third line in English

textfile2:文本文件 2:

C'est la première ligne en Français
C'est la deuxième ligne en Français
C'est la troisième ligne en Français

desired output期望的输出

This is a the first line in English\tC'est la première ligne en Français
This is a the 2nd line in English\tC'est la deuxième ligne en Français
This is a the third line in English\tC'est la troisième ligne en Français

There is a Java version of this Read two textfile line by line simultaneously -java , but Python doesn't use bufferedreader that reads line by line. Read two textfile line by line simultaneously -java有一个 Java 版本,但是 Python 不使用逐行读取的 bufferedreader。 So how would it be done?那么怎么做呢?

    with open("textfile1") as textfile1, open("textfile2") as textfile2: 
        for x, y in izip(textfile1, textfile2):
            x = x.strip()
            y = y.strip()
            print(f"{x}\t{y}")

In Python 2, replace built-in zip with itertools.izip :在 Python 2 中,将内置zip替换为itertools.izip

    from itertools import izip

    with open("textfile1") as textfile1, open("textfile2") as textfile2: 
        for x, y in izip(textfile1, textfile2):
            x = x.strip()
            y = y.strip()
            print("{0}\t{1}".format(x, y))
with open(file1) as f1, open(fil2) as f2:
  for x, y in zip(f1, f2):
     print("{0}\t{1}".format(x.strip(), y.strip()))

output:输出:

This is a the first line in English C'est la première ligne en Français
This is a the 2nd line in English   C'est la deuxième ligne en Français
This is a the third line in English C'est la troisième ligne en Français

We could use generator for more convenient file opening, and it could easily support to iterator on more files simultaneously.我们可以使用generator来更方便地打开文件,它可以轻松地支持同时对更多文件进行迭代。

filenames = ['textfile1', 'textfile2']

def gen_line(filename):
    with open(filename) as f:
        for line in f:
            yield line.strip()

gens = [gen_line(n) for n in filenames]

for file1_line, file2_line in zip(*gens):
    print("\t".join([file1_line, file2_line]))

Note:笔记:

  1. This is python 3 code.这是python 3代码。 For python 2 , use itertools.izip like other people said.对于python 2 ,像其他人说的那样使用itertools.izip
  2. zip would stop after the shortest file is iterated over, use itertools.zip_longest if it matters. zip将在迭代最短文件后停止,如果重要,请使用itertools.zip_longest

Python does let you read line by line, and it's even the default behaviour - you just iterate over the file like would iterate over a list. Python 确实允许您逐行阅读,这甚至是默认行为 - 您只需像遍历列表一样遍历文件。

wrt/ iterating over two iterables at once, itertools.izip is your friend: wrt/一次迭代两个可迭代对象,itertools.izip 是你的朋友:

from itertools import izip
fileA = open("/path/to/file1")
fileB = open("/path/to/file2")
for lineA, lineB in izip(fileA, fileB):
    print "%s\t%s" % (lineA.rstrip(), lineB.rstrip())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM