简体   繁体   中英

Comparing two text files in python

I need to compare two files and redirect the different lines to third file. I know using diff command i can get the difference . But, is there any way of doing it in python ? Any sample code will be helpful

check out difflib

This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs[...]

A command-line example in http://docs.python.org/library/difflib.html#difflib-interface

#compare 2 text files.

test1filehandle = open("test1.txt", "r") #creating a file handle
test2filehandle=open("test2.txt","r") #creating a file handle to read
test3filehandle=open("test3.txt","w") #creating a file handle to write
test1list= test1filehandle.readlines() #read the lines and store in the list
test2list=test2filehandle.readlines()
k=1
for i,j in zip(test1list,test2list): #zip is used to iterate the variablea in 2 lists simultaneoously   
    if i !=j:
        test3filehandle.write("Line Number:" +str(k)+' ')
        test3filehandle.write(i.rstrip("\n") + ' '+ j)
    k=int(k)
    k=k+1;

Comparing two text files in python?

Sure, difflib makes it easy.

Let's set up a demo:

f1path = 'file1'
f2path = 'file2'
text1 = '\n'.join(['a', 'b', 'c', 'd', ''])
text2 = '\n'.join(['a', 'ba', 'bb', 'c', 'def', ''])
for path, text in ((f1path, text1), (f2path, text2)):
    with open(path, 'w') as f:
        f.write(text)

Now to inspect a diff. The lines that use os and time are merely used to provide a decent timestamp for the last time your files were modified, and are completely optional, and are optional arguments to difflib.unified_diff :

# optional imports:
import os
import time
# necessary import:
import difflib

Now we just open the files, and pass a list of their lines (from f.readlines ) to difflib.unified_diff , and join the list output with an empty string, printing the results:

with open(f1path, 'rU') as f1:
    with open(f2path, 'rU') as f2:
        readable_last_modified_time1 = time.ctime(os.path.getmtime(f1path)) # not required
        readable_last_modified_time2 = time.ctime(os.path.getmtime(f2path)) # not required
        print(''.join(difflib.unified_diff(
          f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path, 
          fromfiledate=readable_last_modified_time1, # not required
          tofiledate=readable_last_modified_time2, # not required
          )))

which prints:

--- file1       Mon Jul 27 08:38:02 2015
+++ file2       Mon Jul 27 08:38:02 2015
@@ -1,4 +1,5 @@
 a
-b
+ba
+bb
 c
-d
+def

Again, you can remove all the lines that are declared optional/not required and get the otherwise same results without the timestamp.

redirect the different lines to a third file

instead of printing, open a third file to write the lines:

        difftext = ''.join(difflib.unified_diff(
          f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path, 
          fromfiledate=readable_last_modified_time1, # not required
          tofiledate=readable_last_modified_time2, # not required
          ))
        with open('diffon1and2', 'w') as diff_file:
            diff_file.write(difftext)

and:

$ cat diffon1and2
--- file1       Mon Jul 27 11:38:02 2015
+++ file2       Mon Jul 27 11:38:02 2015
@@ -1,4 +1,5 @@
 a
-b
+ba
+bb
 c
-d
+def
import sys
if len(sys.argv) !=3 :
  print "usage:" + sys.argv[0] + "   bla bla"
  exit
elif len(sys.argv) == 3:
  file1 = set((x for x in open(sys.argv[1])))
  file2 = set((x for x in open(sys.argv[2])))
  file3 = file2.difference(file1)
  file4 = file1.difference(file2)
  str1="file1-contains but  file2 not \n"
  str2="file2-contains but  file1 not\n"
  FILE = open('file3','w')
  FILE.writelines(str2)
  FILE.writelines(file3)
  FILE.writelines(str1)
  FILE.writelines(file4)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM