简体   繁体   English

在 Python 中使用 difflib 比较两个 .txt 文件

[英]Comparing two .txt files using difflib in Python

I am trying to compare two text files and output the first string in the comparison file that does not match but am having difficulty since I am very new to python.我正在尝试比较两个文本文件并输出比较文件中不匹配但遇到困难的第一个字符串,因为我对 python 很陌生。 Can anybody please give me a sample way to use this module.谁能给我一个使用这个模块的示例方法。

When I try something like:当我尝试类似的事情时:

result = difflib.SequenceMatcher(None, testFile, comparisonFile)

I get an error saying object of type 'file' has no len.我收到一条错误消息,提示“文件”类型的对象没有 len。

For starters, you need to pass strings to difflib.SequenceMatcher, not files:对于初学者,您需要将字符串传递给 difflib.SequenceMatcher,而不是文件:

# Like so
difflib.SequenceMatcher(None, str1, str2)

# Or just read the files in
difflib.SequenceMatcher(None, file1.read(), file2.read())

That'll fix your error.这将解决您的错误。

To get the first non-matching string, see the difflib documentation.要获取第一个不匹配的字符串,请参阅difflib 文档。

Here is a quick example of comparing the contents of two files using Python difflib...这是一个使用 Python difflib 比较两个文件内容的快速示例...

import difflib

file1 = "myFile1.txt"
file2 = "myFile2.txt"

diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),

Are you sure both files exist ?你确定这两个文件都存在吗?

Just tested it and i get a perfect result.刚刚测试它,我得到了一个完美的结果。

To get the results i use something like:为了得到结果,我使用类似的东西:

import difflib

diff=difflib.ndiff(open(testFile).readlines(), open(comparisonFile).readlines())

try:
    while 1:
        print diff.next(),
except:
    pass

the first character of each line indicates if they are different: eg.: '+' means the following line has been added, etc.每行的第一个字符表示它们是否不同:例如:“+”表示已添加以下行,等等。

It sounds like you may not need difflib at all.听起来您可能根本不需要 difflib。 If you're comparing line by line, try something like this:如果您逐行比较,请尝试以下操作:

test_lines = open("test.txt").readlines()
correct_lines = open("correct.txt").readlines()

for test, correct in zip(test_lines, correct_lines):
    if test != correct:
        print "Oh no! Expected %r; got %r." % (correct, test)
        break
else:
    len_diff = len(test_lines) - len(correct_lines)
    if len_diff > 0:
        print "Test file had too much data."
    elif len_diff < 0:
        print "Test file had too little data."
    else:
        print "Everything was correct!"

Another easier method to check whether two text files are same line by line.另一种更简单的方法来逐行检查两个文本文件是否相同。 Try it out.试试看。

fname1 = 'text1.txt'
fname2 = 'text2.txt'

f1 = open(fname1)
f2 = open(fname2)

lines1 = f1.readlines()
lines2 = f2.readlines()
i = 0
f1.seek(0)
f2.seek(0)
for line1 in f1:
    if lines1[i] != lines2[i]:
        print(lines1[i])
        exit(0)
    i = i+1

print("both are equal")

f1.close()
f2.close()

otherwise, there is a predefined file in python in filecmp which you can use.否则,您可以使用 filecmp 中的 python 中的预定义文件。

import filecmp

fname1 = 'text1.txt'
fname2 = 'text2.txt'

print(filecmp.cmp(fname1, fname2))

:) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM