简体   繁体   中英

How can I create lag variable for a particular variable for each ID

I wanna create a lag variable named lag_ins

Which look likes:

year  ID    emissions   ins    lag_ins

2010   1     10          0       Nan
2011   1     20          1       0
2012   1     30          1       1
2010   2     10          1       Nan
2011   2     20          0       1
2012   2     40          1       0

I have used following codes:

df['ID'] = df.groupby(['year']).cumcount()+1
df4['lag_ins'] = np.insert(df.ins.values,0,0)[:1]
df.loc[df.groupby(["ID"]).cumcount() == 0,'lag_ins']= np.nan

But it does not work.

You can just do groupby.shift :

df['lag_ins'] = df.groupby('ID').ins.shift()

df
#   year  ID  emissions  ins  lag_ins
#0  2010   1         10    0      NaN
#1  2011   1         20    1      0.0
#2  2012   1         30    1      1.0
#3  2010   2         10    1      NaN
#4  2011   2         20    0      1.0
#5  2012   2         40    1      0.0

And if you need the shift operation to be ordered by year :

df['lag_ins'] = df.sort_values('year').groupby('ID').ins.shift()

df
#   year  ID  emissions  ins  lag_ins
#0  2010   1         10    0      NaN
#1  2011   1         20    1      0.0
#2  2012   1         30    1      1.0
#3  2010   2         10    1      NaN
#4  2011   2         20    0      1.0
#5  2012   2         40    1      0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM