简体   繁体   English

用python比较两个文件行

[英]Compare two files lines with python

This might sound a little bit stupid but I have been having a hard time figuring it out. 这听起来可能有些愚蠢,但是我一直很难弄清楚。 I have two text files and all I want to do is to compare each line of the first file with all of the lines of the second file. 我有两个文本文件,我要做的就是将第一个文件的每一行与第二个文件的所有行进行比较。 So far I just wanted to test a small part of my code which is: 到目前为止,我只想测试代码的一小部分:

for line1 in file1:
    print line1
    for line2 in file2:
        print line2

I thought this small code would give me a line from first file followed by all the lines from the second file. 我认为这段小代码会给我第一个文件的一行,然后是第二个文件的所有行。 But the way it works is totally different. 但是它的工作方式完全不同。 It gives me this: 它给了我这个:

in file 1 line 1
in file 2 line 1
in file 2 line 2
in file 2 line 3
in file 1 line 2

What I expect to see: 我希望看到的是:

in file 1 line 1
in file 2 line 1
in file 2 line 2
in file 2 line 3

in file 1 line 2
in file 2 line 1
in file 2 line 2
in file 2 line 3

Any idea of what I might be doing wrong here? 我在这里可能做错了什么的想法吗?

PLEASE NOTE: I don't want to just compare the whole lines with each other to check if they are the same or not, I need to do some string operations before so the zip and stuff like that won't help me. 请注意:我不想只比较整行以检查它们是否相同,我需要先进行一些字符串操作,这样zip之类的东西对我没有帮助。 Thanks 谢谢

Thanks in advance 提前致谢

What has happened here is that a file is an iterator, and you have exhausted it (run out). 这里发生的是文件是一个迭代器,您已经用尽了它(用完了)。 You can see that by trying to loop over the same file twice: 通过尝试两次遍历同一文件,您可以看到:

>>> f2=open("CLI.md")
>>> for i in f2:
...     print(i)
... 
The CLI
(file contents...)
>>> for i in f2:
...     print(i)
... 
>>>

The best way of handling that here is to first convert the file in the inner loop to a list before looping: 最好的处理方法是在循环之前先将内部循环中的文件转换为列表:

file2_lines = list(file2)
for line1 in file1:
    print line1
    for line2 in file2_lines:
        print line2

Also see: exhausted iterators - what to do about them? 另请参阅: 精疲力尽的迭代器-如何处理它们?

zip may be your friend here. zip 可能是您的朋友。

For example, 例如,

for line_a, line_b in zip(file_1, file_2):
  #do something with your strings

Sample terminal code: 终端代码示例:

>>> file_1 = ['a', 'b', 'c', 'd']
>>> file_2 = ['a', 'one', 'c', 'd', 'e']
>>> for a, b in zip(file_1, file_2):
...   if a == b:
...     print('equal!')
...   else:
...     print('nope!')
... 
equal!
nope!
equal!
equal!
>>> for a, b in zip(file_2, file_1):
...   print(a, b)
... 
a a
one b
c c
d d

Notice anything strange? 有什么奇怪的发现吗?

As per the Python Docs "zip() should only be used with unequal length inputs when you don't care about trailing, unmatched values from the longer iterables. If those values are important, use itertools.zip_longest() instead." 按照Python Docs的规定, “当您不关心较长的可迭代对象的尾部不匹配的值时,仅应将zip()与不等长的输入一起使用。如果这些值很重要,请改用itertools.zip_longest()。”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM