简体   繁体   中英

How to compare all of the files in directory with each other two by two in Python?

I have a directory and I want to compare all of the files in it and get the percentage to the match between them. As the starting point, I decide to open one file and compare other files with that one:

filelist=[]
diff_list=[]
f= open("D:/Desktop/sample/ff69.txt")
flines= f.readlines()
path="D:/Desktop/sample"
for root, dirnames, filenames in os.walk(path):  
    for filename in fnmatch.filter(filenames, '*.txt'):   
        filelist.append(os.path.join(root, filename))


for m in filelist:
    g = open(m,'r')
    glines= g.readlines()



    d = difflib.Differ()
    #print d
    diffl= diff_list.append(d.compare(flines, glines))


print("".join(diff))#n_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0
#

But my code those not work, which means that when I am printing it I get "None". any has any idea why? Or any better idea about comparing all of the files in a directory two by two?

If you're attempting to compare files pairwise you probably want something closer to this:

files = os.listdir('root')
for idx, filename in enumerate(files):
  try:
    fcompare = files[idx + 1]
  except IndexError:
    # We've reached the last file.
    break
  # Actual diffing code.
  d = difflib.Differ()
  lines1 = open(filename).readlines()
  lines2 = open(fcompare).readlines()
  d.compare(lines1, lines2)

That will compare files 1-2, 2-3, 3-4, etc. It may be worth optimizing when you read the files in - file 2 is in use for loop iterations 1 and 2 - so shouldn't have its contents read twice if possible, but that may be premature optimization depending on the volume of files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM