简体   繁体   中英

find difference between any two columns of dataframes with a common key column pandas

I have two dataframes with one having

Title  Name Quantity ID

as the columns

and the 2nd dataframe has

ID Quantity

as the columns with lesser number of rows than first dataframe .

I need to find the difference between the Quantity of both dataframes based the match in the ID columns and I want to store this difference in a seperate column in the first dataframe .

I tried this (did't work) :

DF1[['ID','Quantity']].reset_index(drop=True).apply(lambda id_qty_tup  : DF2[DF2.ID==asin_qty_tup[0]].quantity - id_qty_tup[1] , axis = 1)

Another approach is to apply the ID and quantity of DF1 and iterate through each row of DF2 but it takes more time . Im sure there is a better way .

You can perform index-aligned subtraction, and pandas takes care of the rest.

df['Diff'] = df.set_index('ID').Quantity.sub(df2.set_index('ID').Quantity).values

Demo
Here, changetype is the index, and I've already set it, so pd.Series.sub will align subtraction by default. Otherwise, you'd need to set the index as above.

df1
                      strings      test
changetype                             
0                     a very  -1.250150
1            very boring text -1.376637
2            I cannot read it -1.011108
3                 Hi everyone -0.527900
4             please go home  -1.010845
5               or I will go   0.008159
6                         now -0.470354

df2
                                     strings      test
changetype                                            
0                    a very very boring text  0.625465
1                           I cannot read it -1.487183
2                                Hi everyone  0.292866
3            please go home or I will go now  1.430081

df1.test.sub(df2.test)

changetype
0   -1.875614
1    0.110546
2   -1.303974
3   -1.957981
4         NaN
5         NaN
6         NaN
Name: test, dtype: float64

You can use map in this case:

df['diff'] = df['ID'].map(df2.set_index('ID').Quantity) - df.Quantity

Some Data

import pandas as pd
df = pd.DataFrame({'Title': ['A', 'B', 'C', 'D', 'E'],
                   'Name': ['AA', 'BB', 'CC', 'DD', 'EE'],
                   'Quantity': [1, 21, 14, 15, 611],
                   'ID': ['A1', 'A1', 'B2', 'B2', 'C1']})

df2 = pd.DataFrame({'Quantity': [11, 51, 44],
                    'ID': ['A1', 'B2', 'C1']})

We will use df2 to create a dictionary which can be used to map ID to Quantity . So anywhere there is an ID==A1 in df it gets assigned the Quantity 11, B2 gets assigned 51 and C1 gets assigned 44. Here' I'll add it as another column just for illustration purposes.

df['Quantity2'] = df['ID'].map(df2.set_index('ID').Quantity)
print(df)

   ID Name  Quantity Title  Quantity2
0  A1   AA         1     A         11
1  A1   BB        21     B         11
2  B2   CC        14     C         51
3  B2   DD        15     D         51
4  C1   EE       611     E         44
Name: ID, dtype: int64

Then you can just subtract df['Quantity'] and the column we just created to get the difference. (Or subtract that from df['Quantity'] if you want the other difference)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM