Find difference between two csv file column wise using python

Question

I have two csv files file1.csv

col1,col2,col3
1,2,3
4,5,6
7,8,9

file2.csv

col1,col2,col3
0,2,3
4,0,6
7,8,9

I want to compare these two files column wise output the result to another file. file3.csv

col1,col2
1,0
0,5
0,0

The code i tried,

import csv
with open('file1.csv', 'r') as t1:
    old_csv = t1.readlines()
with open('file2.csv', 'r') as t2:
    new_csv = t2.readlines()

with open('file3.csv', 'w') as out_file:
    line_in_new = 1
    line_in_old = 1
    leng=len(new_csv)
    out_file.write(new_csv[0])
    while line_in_new < len(new_csv) and line_in_old < len(old_csv):
        if (old_csv[line_in_old]) != (new_csv[line_in_new]):
            out_file.write(new_csv[line_in_new])
        else:
            line_in_old += 1
        line_in_new += 1

this is a little altered version from one of the answers here in stackoverflow. How can i achieve a column wise comparision.

Thanks in advance!

Answer 1

If you have read your lines you can do the following:

for i in range(min(len(old_csv), len(new_csv))):
    for new_value,old_value in zip(new_csv[i].split(","), old_csv[i].split(",")): # you can add slicing here ([start:stop]) to only select certain columns
        # write whatever you want to the new file e.g.:
        new_file.write(str(int(new_value) - int(old_value)))

I hope that answers your question.

Answer 2

If the number of columns and rows are the same in the both CSV files you can use pandas to quickly get the difference.

import pandas as pd

df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

diff = df1 - df2
diff.to_csv('file3.csv', index=False)

The file3.csv contents will look like:

col1,col2,col3
1,0,0
0,5,0
0,0,0

Answer 3

Answer from James is correct and should solve your problem. In case you want to avoid few columns like ID_col, string_cols you can try below code. cols is the list of columns you want to calculate difference

import pandas as pd 

cols = ['req_col1','req_col2','req_col3']
df3 = pd.DataFrame(cols )
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

for col in cols:
    df3[col] =  df1[col] -  df2[col]
df3.to_csv('filepath.csv')

Find difference between two csv file column wise using python

Question

3 answers

solution1
2 2021-01-20 10:49:03

solution2
0 2021-01-20 10:47:35

solution3
0 2021-01-20 11:00:39

Find difference between two csv file column wise using python

Question

3 answers

solution1 2 2021-01-20 10:49:03

solution2 0 2021-01-20 10:47:35

solution3 0 2021-01-20 11:00:39

solution1
2 2021-01-20 10:49:03

solution2
0 2021-01-20 10:47:35

solution3
0 2021-01-20 11:00:39