简体   繁体   English

比较python中的两个文本文件

[英]Comparing two text files in python

I need to compare two files and redirect the different lines to third file. 我需要比较两个文件并将不同的行重定向到第三个文件。 I know using diff command i can get the difference . 我知道使用diff命令我可以得到区别。 But, is there any way of doing it in python ? 但是,有没有办法在python中做到这一点? Any sample code will be helpful 任何示例代码都会有所帮助

check out difflib 看看difflib

This module provides classes and functions for comparing sequences. 该模块提供用于比较序列的类和函数。 It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs[...] 例如,它可用于比较文件,并可以各种格式产生差异信息,包括HTML和上下文以及统一差异[...]

A command-line example in http://docs.python.org/library/difflib.html#difflib-interface http://docs.python.org/library/difflib.html#difflib-interface中的命令行示例

#compare 2 text files.

test1filehandle = open("test1.txt", "r") #creating a file handle
test2filehandle=open("test2.txt","r") #creating a file handle to read
test3filehandle=open("test3.txt","w") #creating a file handle to write
test1list= test1filehandle.readlines() #read the lines and store in the list
test2list=test2filehandle.readlines()
k=1
for i,j in zip(test1list,test2list): #zip is used to iterate the variablea in 2 lists simultaneoously   
    if i !=j:
        test3filehandle.write("Line Number:" +str(k)+' ')
        test3filehandle.write(i.rstrip("\n") + ' '+ j)
    k=int(k)
    k=k+1;

Comparing two text files in python? 比较python中的两个文本文件?

Sure, difflib makes it easy. 当然, difflib让它变得简单。

Let's set up a demo: 我们来设置一个演示:

f1path = 'file1'
f2path = 'file2'
text1 = '\n'.join(['a', 'b', 'c', 'd', ''])
text2 = '\n'.join(['a', 'ba', 'bb', 'c', 'def', ''])
for path, text in ((f1path, text1), (f2path, text2)):
    with open(path, 'w') as f:
        f.write(text)

Now to inspect a diff. 现在来检查差异。 The lines that use os and time are merely used to provide a decent timestamp for the last time your files were modified, and are completely optional, and are optional arguments to difflib.unified_diff : 使用ostime的行仅用于为上次修改文件提供合适的时间戳,并且是完全可选的,并且是difflib.unified_diff可选参数:

# optional imports:
import os
import time
# necessary import:
import difflib

Now we just open the files, and pass a list of their lines (from f.readlines ) to difflib.unified_diff , and join the list output with an empty string, printing the results: 现在我们只打开文件,并将它们的行列表(从f.readlines )传递给difflib.unified_diff ,并使用空字符串连接列表输出,打印结果:

with open(f1path, 'rU') as f1:
    with open(f2path, 'rU') as f2:
        readable_last_modified_time1 = time.ctime(os.path.getmtime(f1path)) # not required
        readable_last_modified_time2 = time.ctime(os.path.getmtime(f2path)) # not required
        print(''.join(difflib.unified_diff(
          f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path, 
          fromfiledate=readable_last_modified_time1, # not required
          tofiledate=readable_last_modified_time2, # not required
          )))

which prints: 打印:

--- file1       Mon Jul 27 08:38:02 2015
+++ file2       Mon Jul 27 08:38:02 2015
@@ -1,4 +1,5 @@
 a
-b
+ba
+bb
 c
-d
+def

Again, you can remove all the lines that are declared optional/not required and get the otherwise same results without the timestamp. 同样,您可以删除声明为optional / not required的所有行,并在没有时间戳的情况下获得相同的结果。

redirect the different lines to a third file 将不同的行重定向到第三个文件

instead of printing, open a third file to write the lines: 而不是打印,打开第三个文件来写行:

        difftext = ''.join(difflib.unified_diff(
          f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path, 
          fromfiledate=readable_last_modified_time1, # not required
          tofiledate=readable_last_modified_time2, # not required
          ))
        with open('diffon1and2', 'w') as diff_file:
            diff_file.write(difftext)

and: 和:

$ cat diffon1and2
--- file1       Mon Jul 27 11:38:02 2015
+++ file2       Mon Jul 27 11:38:02 2015
@@ -1,4 +1,5 @@
 a
-b
+ba
+bb
 c
-d
+def
import sys
if len(sys.argv) !=3 :
  print "usage:" + sys.argv[0] + "   bla bla"
  exit
elif len(sys.argv) == 3:
  file1 = set((x for x in open(sys.argv[1])))
  file2 = set((x for x in open(sys.argv[2])))
  file3 = file2.difference(file1)
  file4 = file1.difference(file2)
  str1="file1-contains but  file2 not \n"
  str2="file2-contains but  file1 not\n"
  FILE = open('file3','w')
  FILE.writelines(str2)
  FILE.writelines(file3)
  FILE.writelines(str1)
  FILE.writelines(file4)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM