while comparing two csv files, I want to export the only columns which has mismatch.Below code gives me all the columns, not the mismatch column, in this case only email, profession column has the mismatch values between 2 dataframes. How to export only mismatch columns?
import pandas as pa
df1 = pa.read_csv('data1.csv')
df2 = pa.read_csv('data2.csv')
diff =df1.merge(df2,indicator=True,how='outer')
diff1 = diff[diff['_merge'] == 'left_only']
print(diff1)
Below is the output
id firstname lastname ... email2 profession _merge
1 101 Gilligan Brenn ... Gilligan.Brenn@gmail.com worker left_only
8 108 Sherrie Ventre ... Sherrie.Ventre@gmail.com police officer left_only
9 109 Roseline Roscoe ... Roseline.Roscoe@gmail.com developer left_only
You could try with pandas.Dataframe.ne , like this:
import pandas as pd
df1 = pa.read_csv('data1.csv')
df2 = pa.read_csv('data2.csv')
diffs = (
df2[df2.ne(df1)]
.dropna(axis=1, how="all")
.dropna(axis=0, how="all")
.fillna("")
)
print(diffs)
# Outputs
email2 profession
1 Gilligan.Brenn@gmail.com
8 worker
9 Roseline.Roscoe@gmail.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.