I wanna create a lag variable named lag_ins
Which look likes:
year ID emissions ins lag_ins
2010 1 10 0 Nan
2011 1 20 1 0
2012 1 30 1 1
2010 2 10 1 Nan
2011 2 20 0 1
2012 2 40 1 0
I have used following codes:
df['ID'] = df.groupby(['year']).cumcount()+1
df4['lag_ins'] = np.insert(df.ins.values,0,0)[:1]
df.loc[df.groupby(["ID"]).cumcount() == 0,'lag_ins']= np.nan
But it does not work.
You can just do groupby.shift
:
df['lag_ins'] = df.groupby('ID').ins.shift()
df
# year ID emissions ins lag_ins
#0 2010 1 10 0 NaN
#1 2011 1 20 1 0.0
#2 2012 1 30 1 1.0
#3 2010 2 10 1 NaN
#4 2011 2 20 0 1.0
#5 2012 2 40 1 0.0
And if you need the shift operation to be ordered by year
:
df['lag_ins'] = df.sort_values('year').groupby('ID').ins.shift()
df
# year ID emissions ins lag_ins
#0 2010 1 10 0 NaN
#1 2011 1 20 1 0.0
#2 2012 1 30 1 1.0
#3 2010 2 10 1 NaN
#4 2011 2 20 0 1.0
#5 2012 2 40 1 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.