简体   繁体   中英

difference between two dataframes in Pandas

I am trying to find difference between two dataframe and the resulting df should return the rows matching the first dataframe. Since id's 6,7 was not there in df2 so the count value is as it is.

My Two Dataframes

在此处输入图片说明

Resulting Dataframe:

在此处输入图片说明

Use sub with set_index for align DataFrame s by id columns, add reindex for id only by df1.id :

df = (df1.set_index('id')
        .sub(df2.set_index('id'), fill_value=0)
        .reindex(df1['id'])
        .astype(int)
        .reset_index())
print (df)
   id  count
0   1      0
1   2      0
2   3      0
3   4      0
4   5      0
5   6      9
6   7      4

Another solution with merge and left join, then subtract by sub with extracting count_ column by pop :

df = df1.merge(df2, on='id', how='left', suffixes=('','_'))
df['count'] = df['count'].sub(df.pop('count_'), fill_value=0).astype(int)
print (df)
   id  count
0   1      0
1   2      0
2   3      0
3   4      0
4   5      0
5   6      9
6   7      4

Setup :

df1 = pd.DataFrame({'id':[1,2,3,4,5,6,7],
                    'count':[3,5,6,7,2,9,4]})

print (df1)
   id  count
0   1      3
1   2      5
2   3      6
3   4      7
4   5      2
5   6      9
6   7      4

df2 = pd.DataFrame({'id':[1,2,3,4,5,8,9],
                    'count':[3,5,6,7,2,4,2]})

print (df2)
   id  count
0   1      3
1   2      5
2   3      6
3   4      7
4   5      2
5   8      4
6   9      2

Use:

temp = pd.merge(df1, df2, how='left', on='id').fillna(0)
temp['count'] = temp['count_x'] - temp['count_y']
temp[['id', 'count']]

   id  count
0   1    0.0
1   2    0.0
2   3    0.0
3   4    0.0
4   5    0.0
5   6    9.0
6   7    4.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM