简体   繁体   中英

Pandas lookup from same dataframe for criteria then add to right as new column

My goal is to create an excel-vlookup-equivalent in python which takes the value of the past month and adds it to a new column to the current month, ie id, month, value_current_month, value_past_month:

From This:

id  month  value
01     09    123
02     09    234
03     09    345
01     08    543
02     08    432
03     08    321
01     07    678
02     07    789
03     07    890
..     ..    ...

To this:

id  month  value  new
01     09    123  543
02     09    234  432
03     09    345  321
01     08    543  678
02     08    432  789
03     08    321  890
01     07    678  ...
02     07    789  ...
03     07    890  ...
..     ..    ...  ...

I have imported pandas and numpy and created a dataframe called "df". As I am unfamiliar with the syntax of python, any help would be greatly appreciated.

Thank you!

  1. The proper way to do this is to create a Date column (since you will likely have multiple years, you cannot just join on month)
  2. Then, merge the dataframe back on itself but shifted one month with + pd.DateOffset(months=1) . and join on Date and id :

#sample dataframe setup
import pandas as pd
df = pd.DataFrame({'id': {0: '01',1: '02',2: '03',3: '01',4: '02',5: '03',6: '01',7: '02',8: '03'},
'month': {0: '09',1: '09',2: '09',3: '08',4: '08',5: '08', 6: '07',7: '07',8: '07'},
'value': {0: 123,1: 234,2: 345,3: 543,4: 432,5: 321, 6: 678,7: 789,8: 890}})
df

#solution 1
df['Year'] = '2020'
df['Date'] = pd.to_datetime(df['Year'] + '-' + df['month'])
df = (pd.merge(df, df[['Date', 'value', 'id']].rename({'value' : 'new_value'}, axis=1)
                                              .assign(Date=df['Date'] + pd.DateOffset(months=1)),
                      how='left', on=['Date' , 'id']).drop('Date', axis=1))
df
Out[1]: 
   id month  value  Year  new_value
0   1    09    123  2020      543.0
1   2    09    234  2020      432.0
2   3    09    345  2020      321.0
3   1    08    543  2020      678.0
4   2    08    432  2020      789.0
5   3    08    321  2020      890.0
6   1    07    678  2020        NaN
7   2    07    789  2020        NaN
8   3    07    890  2020        NaN

Use .shift(-3) . if the problem is simple and you have three ID values per month. You can change -3 to -12 for example if you have 12 id values in your actual dataframe per month. This also assumes you have sorted your dataframe:

#solution 2
df['new'] = df['value'].shift(-3)
df

Out[2]: 
   id  month  value    new
0   1      9    123  543.0
1   2      9    234  432.0
2   3      9    345  321.0
3   1      8    543  678.0
4   2      8    432  789.0
5   3      8    321  890.0
6   1      7    678    NaN
7   2      7    789    NaN
8   3      7    890    NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM