简体   繁体   中英

How do I compare two text files from different folders?

Assume that I have two folders with 1000 text files in them, for example, folder 1 and folder 2.

Those two folders have text files with the same names, for example:

folder 1: ab.txt,  bc.txt,  cd.txt, ac.txt, etc. 
folder 2:  ab.txt,  bc.txt,  cd.txt, ac.txt, etc. 

Each text file contain bunch of numbers. Here is an example of the text inside the text file, for example, ab.txt from folder 1 has:

5 0.796 0.440 0.407 0.399
24 0.973 0.185 0.052 0.070
3 0.91 0.11 0.12 0.1

and ab.txt from folder 2 has:

1 0.8 0.45 0.407 0.499
24 0.973 0.185 0.052 0.070
5 5.91 6.2 2.22 0.2

I want to read the text files inside of those two folders and compare the first column of the each pair of text files that has the same name (indicated above). For example, if the first columns of the two text files have different numbers, I want to move those from folder_1 to another folder called "output". Here is what I wrote. I can compare two text files. However, I wonder how do I compare similar text files located in two different folders?

import difflib

with open(r'path to txt file\') as folder_1:
    file_1_text = file_1.readlines()

with open(r'r'path to txt file\'') as folder_2:
    file_2_text = file_2.readlines()

# Find and print the diff:
for line in difflib.unified_diff(
        file_1_text, file_2_text, fromfile='file1.txt',
        tofile='file2.txt', lineterm=''):
    print(line)```



You can create a list of all files in a folder with os.listdir() .

folder1_files = os.listdir(folder_path1)
folder2_files = os.listdir(folder_path2)

Than you can iterate over both lists and check if the file names are equal.

for file1 in folder1_files:
    for file2 in folder2_files:
        if file1 == file2:
            ...

Comparing the first line is also not that difficult. Read the lines of both files and check if they are different.

file1_path = os.path.join(folder_path1, file1)
file2_path = os.path.join(folder_path2, file2)
file1_file = open(file1_path, 'r')
file2_file = open(file2_path, 'r')
file1_lines = file1_file.readlines()
file2_lines = file2_file.readlines()
if file1_lines[0] != file2_lines[0]:
    ...

I would either use shutil.move or shutil.copy to move/copy the files.

shutil.copy(file1_path, "output/" + file1)

Closing the file descriptors

file1_file.close()
file2_file.close()

All together in a function:

def compare_files(folder_path1, folder_path2):
    import os
    import shutil
    folder1_files = os.listdir(folder_path1)
    folder2_files = os.listdir(folder_path2)
    for file1 in folder1_files:
        for file2 in folder2_files:
            if file1 == file2:
                file1_path = os.path.join(folder_path1, file1)
                file2_path = os.path.join(folder_path2, file2)
                file1_file = open(file1_path, 'r')
                file2_file = open(file2_path, 'r')
                file1_lines = file1_file.readlines()
                file2_lines = file2_file.readlines()
                output_path = "output"
                if not os.path.exists(output_path):
                    os.makedirs(output_path)
                if file1_lines[0] != file2_lines[0]:
                    shutil.copy(file1_path, output_path + "/" + file1)
                file1_file.close()
                file2_file.close()

compare_files("folder1", "folder2")

if you want to compare the numbers and eg 1 should be the same as 1.0 you can do the following.

l1 = file1_lines[0].split()
l2 = file2_lines[0].split()
for i in range(len(l1 if len(l1) < len(l2) else l2)):
    if float(l1[i]) != float(l2[i]):
        output_path = "output"
        if not os.path.exists(output_path):
            os.makedirs(output_path)
        shutil.copy(file1_path, output_path)
        break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM