简体   繁体   中英

Compare 2 CSV files

I am having difficulty comparing two CSV files and printing out a separate report. I want my script to first match the IDs on the two files then compare the rest of row and print out a separate report showing the difference. The script I have compares two files and prints difference but won't work if the new file has additional rows.

example of the two files:

OLD file

ID  fname   lname   status
1   joe pol active
2   peters  dol active
3   john    nol active
4   mike    sol active

New file

ID  fname   lname   status
1   joe pol active
2   peter   dol active
67  ryan    olson   stop
3   johnny  nolly   stop 
4   mike    sol active

Code:

import csv

orig = open('OLD.csv','r')
new = open('NEW.csv','r')

Change = set(new) - set(orig)

print(Change)

with open('OLD.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('different.csv', 'w') as file_out:
        for line in Change:
            file_out.write(line)

orig.close()
new.close()
file_out.close()

Since CSV files need to comma separated, I'm assuming your files can be in this format:

old.csv:

ID,fname,lname,status
1,joe,pol,active
2,peters,dol,active
3,john,nol,active
4,mike,sol,active

new.csv:

ID,fname,lname,status
1,joe,pol,active
2,peter,dol,active
67,ryan,olson,stop
3,johnny,nolly,stop
4,mike,sol,active

Then you can convert them into a report with this code:

from csv import reader


# Creates a row dictionary from file
def get_row_map(filename):
    row_map = {}

    with open(filename) as file:
        csv_reader = reader(file)
        _, *headers = next(csv_reader)

        # map ids to rows
        for row in csv_reader:
            idx, *rest = row
            row_map[int(idx)] = dict(zip(headers, rest))

    return row_map


old_row_map = get_row_map("old.csv")
new_row_map = get_row_map("new.csv")

with open("different.txt", "w") as out:

    # Only loop over matched ids
    for row_id in old_row_map.keys() & new_row_map.keys():

        # only proceed if rows are not exactly the same
        if old_row_map[row_id] != new_row_map[row_id]:

            # convert to sets
            old_set, new_set = (
                set(old_row_map[row_id].items()),
                set(new_row_map[row_id].items()),
            )

            # get differences between old and new sets
            old_diff = dict(list(old_set - new_set))
            new_diff = dict(list(new_set - old_set))

            # write out report
            out.write("ID: %d\n" % row_id)
            for key in old_diff:
                out.write(
                    "%s -> old: %s, new: %s\n" % (key, old_diff[key], new_diff[key])
                )

Which Outputs the following difference.txt:

ID: 2
fname -> old: peters, new: peter
ID: 3
fname -> old: john, new: johnny
lname -> old: nol, new: nolly
status -> old: active, new: stop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM