[英]Comparing two text files in python
我需要比較兩個文件並將不同的行重定向到第三個文件。 我知道使用diff命令我可以得到區別。 但是,有沒有辦法在python中做到這一點? 任何示例代碼都會有所幫助
看看difflib
該模塊提供用於比較序列的類和函數。 例如,它可用於比較文件,並可以各種格式產生差異信息,包括HTML和上下文以及統一差異[...]
http://docs.python.org/library/difflib.html#difflib-interface中的命令行示例
#compare 2 text files.
test1filehandle = open("test1.txt", "r") #creating a file handle
test2filehandle=open("test2.txt","r") #creating a file handle to read
test3filehandle=open("test3.txt","w") #creating a file handle to write
test1list= test1filehandle.readlines() #read the lines and store in the list
test2list=test2filehandle.readlines()
k=1
for i,j in zip(test1list,test2list): #zip is used to iterate the variablea in 2 lists simultaneoously
if i !=j:
test3filehandle.write("Line Number:" +str(k)+' ')
test3filehandle.write(i.rstrip("\n") + ' '+ j)
k=int(k)
k=k+1;
比較python中的兩個文本文件?
當然, difflib讓它變得簡單。
我們來設置一個演示:
f1path = 'file1'
f2path = 'file2'
text1 = '\n'.join(['a', 'b', 'c', 'd', ''])
text2 = '\n'.join(['a', 'ba', 'bb', 'c', 'def', ''])
for path, text in ((f1path, text1), (f2path, text2)):
with open(path, 'w') as f:
f.write(text)
現在來檢查差異。 使用os
和time
的行僅用於為上次修改文件提供合適的時間戳,並且是完全可選的,並且是difflib.unified_diff
可選參數:
# optional imports:
import os
import time
# necessary import:
import difflib
現在我們只打開文件,並將它們的行列表(從f.readlines
)傳遞給difflib.unified_diff
,並使用空字符串連接列表輸出,打印結果:
with open(f1path, 'rU') as f1:
with open(f2path, 'rU') as f2:
readable_last_modified_time1 = time.ctime(os.path.getmtime(f1path)) # not required
readable_last_modified_time2 = time.ctime(os.path.getmtime(f2path)) # not required
print(''.join(difflib.unified_diff(
f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path,
fromfiledate=readable_last_modified_time1, # not required
tofiledate=readable_last_modified_time2, # not required
)))
打印:
--- file1 Mon Jul 27 08:38:02 2015
+++ file2 Mon Jul 27 08:38:02 2015
@@ -1,4 +1,5 @@
a
-b
+ba
+bb
c
-d
+def
同樣,您可以刪除聲明為optional / not required的所有行,並在沒有時間戳的情況下獲得相同的結果。
將不同的行重定向到第三個文件
而不是打印,打開第三個文件來寫行:
difftext = ''.join(difflib.unified_diff(
f1.readlines(), f2.readlines(), fromfile=f1path, tofile=f2path,
fromfiledate=readable_last_modified_time1, # not required
tofiledate=readable_last_modified_time2, # not required
))
with open('diffon1and2', 'w') as diff_file:
diff_file.write(difftext)
和:
$ cat diffon1and2
--- file1 Mon Jul 27 11:38:02 2015
+++ file2 Mon Jul 27 11:38:02 2015
@@ -1,4 +1,5 @@
a
-b
+ba
+bb
c
-d
+def
import sys
if len(sys.argv) !=3 :
print "usage:" + sys.argv[0] + " bla bla"
exit
elif len(sys.argv) == 3:
file1 = set((x for x in open(sys.argv[1])))
file2 = set((x for x in open(sys.argv[2])))
file3 = file2.difference(file1)
file4 = file1.difference(file2)
str1="file1-contains but file2 not \n"
str2="file2-contains but file1 not\n"
FILE = open('file3','w')
FILE.writelines(str2)
FILE.writelines(file3)
FILE.writelines(str1)
FILE.writelines(file4)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.