简体   繁体   中英

How to get only the mismatched columns in pandas

while comparing two csv files, I want to export the only columns which has mismatch.Below code gives me all the columns, not the mismatch column, in this case only email, profession column has the mismatch values between 2 dataframes. How to export only mismatch columns?

import pandas as pa
df1 = pa.read_csv('data1.csv')
df2 = pa.read_csv('data2.csv')
diff =df1.merge(df2,indicator=True,how='outer')
diff1 = diff[diff['_merge'] == 'left_only']
print(diff1)

Below is the output

    id firstname lastname  ...                     email2      profession     _merge
1  101  Gilligan    Brenn  ...   Gilligan.Brenn@gmail.com          worker  left_only
8  108   Sherrie   Ventre  ...   Sherrie.Ventre@gmail.com  police officer  left_only
9  109  Roseline   Roscoe  ...  Roseline.Roscoe@gmail.com       developer  left_only

You could try with pandas.Dataframe.ne , like this:

import pandas as pd

df1 = pa.read_csv('data1.csv')
df2 = pa.read_csv('data2.csv')

diffs = (
    df2[df2.ne(df1)]
    .dropna(axis=1, how="all")
    .dropna(axis=0, how="all")
    .fillna("")
)

print(diffs)
# Outputs
                     email2   profession
1  Gilligan.Brenn@gmail.com
8                             worker
9  Roseline.Roscoe@gmail.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM