简体   繁体   English

逐行比较两个文件

[英]Comparing two files line by line

I have this program that simply takes two files and compares them line by line. 我有一个只需要两个文件并逐行比较它们的程序。 It works fine as long as both files have the same amount of lines. 只要两个文件的行数相同,它就可以正常工作。 My problem would be what if for example file2 has more lines than file1? 我的问题是,例如,如果file2的行比file1多,该怎么办? Or the other way around. 或相反。 I get the IndexError: list index out of range error when this happens. 发生这种情况时,我得到IndexError:列表索引超出范围错误。 What can I do to take this into account? 我应该怎么做才能考虑到这一点?

#Compares two files
def compare(baseline, newestFile):



    baselineHolder = open(baseline)
    newestFileHolder = open(newestFile)



    lines1 = baselineHolder.readlines()
    a = returnName(baseline)
    b = returnName(newestFile)


    for i,lines2 in enumerate(newestFileHolder):
        if lines2 != lines1[i]:
            add1 = i + 1
            print ("line ", add1, " in newestFile is different \n")
            print("TAKE A LOOK HERE----------------------TAKE A LOOK HERE")
            print (lines2)
        else:
            addRow = 1 + i
            print ("line  " + str(addRow) + " is identical")

Instead of reinventing the wheel, why not use the built-in difflib ? 为什么不使用内置的difflib而不是重新发明轮子? Here is an example using difflib.unified_diff from the docs: 这是从文档使用difflib.unified_diff的示例:

 >>> s1 = ['bacon\\n', 'eggs\\n', 'ham\\n', 'guido\\n'] >>> s2 = ['python\\n', 'eggy\\n', 'hamster\\n', 'guido\\n'] >>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'): ... sys.stdout.write(line) --- before.py +++ after.py @@ -1,4 +1,4 @@ -bacon -eggs -ham +python +eggy +hamster guido 

Perhaps you can use itertools.izip_longest . 也许您可以使用itertools.izip_longest If one sequence has been exhausted, it emits some fill value (by default, None ): 如果已经用尽了一个序列,它将发出一些填充值(默认情况下为None ):

import itertools

for l, r in itertools.izip_longest(open('foo.txt'), open('bar.txt')):
    if l is None: # foo.txt has been exhausted
        ...
    elif r is None: # bar.txt has been exhausted
        ...
    else: # both still have lines - compare now the content of l and r
        ...

Edit As @danidee correctly notes, for Py3 it is zip_longest . 编辑为@danidee正确注释,对于Py3,它为zip_longest

You should catch the IndexError and then stop your comparison 您应该捕获IndexError然后停止比较

    for i,lines2 in enumerate(newestFileHolder):
        try:
            if lines2 != lines1[i]:
                add1 = i + 1
                print ("line ", add1, " in newestFile is different \n")
                print("TAKE A LOOK HERE----------------------TAKE A LOOK HERE")    
                print (lines2)
            else:
                addRow = 1 + i
                print ("line  " + str(addRow) + " is identical")
        except IndexError:
            print("Exit comparison")
            break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM