[英]Compare two CSV files and output only rows with the specific columns that are different
我有兩個CSV文件,每個文件有6列,並且都有一個公共列EmpID
(用於比較的主鍵)。 例如, File1.csv
是:
EmpID1,Name1,Email1,City1,Phone1,Hobby1
120034,Tom Hanks,tom.hanks@gmail.com,Mumbai,8888999,Fishing
而且File2.csv
是
EmpID2,Name2,Email2,City2,Phone2,Hobby2
120034,Tom Hanks,hanks.tom@gmail.com,Mumbai,8888999,Running
需要比較文件的差異,並且僅應將不同的行和列添加到新的輸出文件中,如下所示:
EmpID1,Email1,Email2,Hobby1,Hobby2
120034,tom.hanks@gmail.com,hanks.tom@gmail.com,Fishing,Running
目前,我已經用Python編寫了以下代碼。 現在我想知道如何識別和選擇差異。 任何指示和幫助將不勝感激。
import csv
import os
os.getcwd()
os.chdir('filepath')
with open('File1.csv', 'r') as csv1, open('File2.csv', 'r') as csv2:
file1 = csv1.readlines()`
file2 = csv2.readlines()`
with open('OutputFile.csv', 'w') as output:
for line in file1:`
if line not in file2:
output.write(line)
output.close()
csv1.close()
csv2.close()
首先將文件讀取為dict結構,並以'EMPID'作為指向整個行的鍵:
import csv
fieldnames = [] # to store all fieldnames
with open('File1.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data1 = {row['EMPID1']: row for row in cf}
fieldnames.extend(cf.fieldnames)
with open('File2.csv') as f:
cf = csv.DictReader(f, delimiter=',')
data2 = {row['EMPID2']: row for row in cf}
fieldnames.extend(cf.fieldnames)
然后識別兩個字典中的所有ID:
ids_to_check = set(data1) & set(data2)
最后,遍歷id並比較行本身
with open('OutputFile.csv', 'w') as f:
cw = csv.DictWriter(f, fieldnames, delimiter=',')
cw.writeheader()
for id in ids_to_check:
diff = compare_dict(data1[id], data2[id], fieldnames)
if diff:
cw.writerow(diff)
這是compare_dict
函數的實現:
def compare_dict(d1, d2, fields_compare):
fields_compare = set(field.rstrip('12') for field in fields_compare)
if any(d1[k + '1'] != d2[k + '2'] for k in fields_compare):
# they differ, return a new dict with all fields
result = d1.copy()
result.update(d2)
return result
else:
return {}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.