简体   繁体   中英

Calculate difference of two columns from two different dataframes based on condition

I have two dataframes with common columns. I would like to create a new column that contains the difference between two columns (one from each dataframe) based on a condition from a third column.

df_a:
Time      Volume    ID    
1         5         1
2         6         2
3         7         3

df_b:
Time      Volume    ID
1          2        2
2          3        1
3          4        3

output is appending a new column to df_a with the differnece between volume columns (df_a.Volume - df_b.Volume) where the two IDs are equal.

df_a:

Time      Volume    ID    Diff   
1         5         1     2
2         6         2     4
3         7         3     3

If ID is unique per row in each dataframe:

df_a['Diff'] = df_a['Volume'] - df_a['ID'].map(df_b.set_index('ID')['Volume'])

Output:

   Time  Volume  ID  Diff
0     1       5   1     2
1     2       6   2     4
2     3       7   3     3

An option is to merge the two dfs on ID and then calculate Diff:

df_a = df_a.merge(df_b.drop(['Time'], axis=1), on="ID", suffixes=['', '2'])
df_a['Diff'] = df_a['Volume'] - df_a['Volume2']

df:

   Time  Volume  ID  Volume2  Diff
0     1       5   1        3     2
1     2       6   2        2     4
2     3       7   3        4     3

Merge the two dataframes on 'ID', then take the difference:

import pandas as pd

df_a = pd.DataFrame({'Time': [1,2,3], 'Volume': [5,6,7], 'ID':[1,2,3]})
df_b = pd.DataFrame({'Time': [1,2,3], 'Volume': [2,3,4], 'ID':[2,1,3]})

merged = pd.merge(df_a,df_b, on = 'ID')
df_a['Diff'] = merged['Volume_x'] - merged['Volume_y']

print(df_a)
#output:

   Time  Volume  ID  Diff
0     1       5   1     2
1     2       6   2     4
2     3       7   3     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM