简体   繁体   中英

Find Difference between 2 dataframes

I am new to pandas and have a question. I have 2 dataframes with me:

df1 = pd.DataFrame({'ID': ['ID1', 'ID2', 'ID3', 'ID6'],
                 'Value': ['59', '29', '73', '34']})

df2 = pd.DataFrame({'ID': ['ID1', 'ID2', 'ID4'],
                     'Value': ['54', '29', '73']})

I want to get a output dataframe which lists out the changed values (ID1) and individual IDs which are in df1 and df2 (like ID3, ID4 and ID6)

Thanks a lot in advance!

do outer merge and calculate difference:

out=df1.merge(df2,on='ID',how='outer')
out['Difference']=out.pop('Value_x').astype(float)-out.pop('Value_y').astype(float)

output of out :

   ID   Difference
0   ID1     5.0
1   ID2     0.0
2   ID3     NaN
3   ID6     NaN
4   ID4     NaN

OR

After merging use fill NaN's with 0:

out=df1.merge(df2,on='ID',how='outer').fillna(0)
out['Difference']=out.pop('Value_x').astype(float)-out.pop('Value_y').astype(float)

output of out :

    ID  Difference
0   ID1     5.0
1   ID2     0.0
2   ID3     73.0
3   ID6     34.0
4   ID4     -73.0
changed = set()
individual = (set(df1['ID'].to_numpy()) - set(df2['ID'].to_numpy())).union(set(df2['ID'].to_numpy()) - set(df1['ID'].to_numpy()))
for i in set(df1['ID'].to_numpy()) - (set(df1['ID'].to_numpy()) - set(df2['ID'].to_numpy())):
    if not df1[df1['ID'] == i]['Value'].equals(df2[df2['ID'] == i]['Value']):
        changed.add(i)
print(changed, individual)
>>> {'ID1'} {'ID6', 'ID3', 'ID4'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM