简体   繁体   中英

Find difference between two csv file column wise using python

I have two csv files file1.csv

col1,col2,col3
1,2,3
4,5,6
7,8,9

file2.csv

col1,col2,col3
0,2,3
4,0,6
7,8,9

I want to compare these two files column wise output the result to another file. file3.csv

col1,col2
1,0
0,5
0,0

The code i tried,

import csv
with open('file1.csv', 'r') as t1:
    old_csv = t1.readlines()
with open('file2.csv', 'r') as t2:
    new_csv = t2.readlines()

with open('file3.csv', 'w') as out_file:
    line_in_new = 1
    line_in_old = 1
    leng=len(new_csv)
    out_file.write(new_csv[0])
    while line_in_new < len(new_csv) and line_in_old < len(old_csv):
        if (old_csv[line_in_old]) != (new_csv[line_in_new]):
            out_file.write(new_csv[line_in_new])
        else:
            line_in_old += 1
        line_in_new += 1

this is a little altered version from one of the answers here in stackoverflow. How can i achieve a column wise comparision.

Thanks in advance!

If you have read your lines you can do the following:

for i in range(min(len(old_csv), len(new_csv))):
    for new_value,old_value in zip(new_csv[i].split(","), old_csv[i].split(",")): # you can add slicing here ([start:stop]) to only select certain columns
        # write whatever you want to the new file e.g.:
        new_file.write(str(int(new_value) - int(old_value)))

I hope that answers your question.

If the number of columns and rows are the same in the both CSV files you can use pandas to quickly get the difference.

import pandas as pd

df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

diff = df1 - df2
diff.to_csv('file3.csv', index=False)

The file3.csv contents will look like:

col1,col2,col3
1,0,0
0,5,0
0,0,0

Answer from James is correct and should solve your problem. In case you want to avoid few columns like ID_col, string_cols you can try below code. cols is the list of columns you want to calculate difference

import pandas as pd 

cols = ['req_col1','req_col2','req_col3']
df3 = pd.DataFrame(cols )
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

for col in cols:
    df3[col] =  df1[col] -  df2[col]
df3.to_csv('filepath.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM