I have the following problem. This is my dataframe:
district curfew_name active value date
A np.nan 0 10 1
A B1 1 20 4
A B1 1 21 6
C D1 1 14 8
C D1 1 16 11
C D2 1 14 13
E F1 0 30 10
E F1 1 14 12
So, each row is a date (2-3 days between each row), in which district
might have a curfew activated. So I want to know for each curfew, what was the value
column's value for that district the date before the first activation of said curfew. So, in this case, curfew B1
gets activated on date 4
, so I check the previous value
for that district and it's 10. For curfew D1
I don't know what's the previous value
for that district, so I would get a nan
. For D2
the previous value is D1
's last value: 16. Finally, for F1
we see it was announced beforehand, so we get a 0 before it is active. The value would be 30, anyways. So, my final Series
would look like this:
curfew_name previous_value
B1 10
D1 np.nan
D2 16
F1 30
So, I can get each curfew's first appearance like this:
df[df.active.eq(1)].reset_index().groupby('curfew_name').first()['index']
And then I simply tried substracting one, and then extracting those indexes:
idx = df[df.active.eq(1)].reset_index().groupby('curfew_name').first()['index'] - 1
But for cases like D1
this would get me a 21
which is a value from another district. How would you go about it? I've tried some combinations of groupby('district')
, shift()
, eq()
, but I'm still not making it in an efficient way.
Thanks !
Edit: my approach for now would be to get the previous index, then check if the row associated with that index is in the same district than the original index and filter those when that condition is met, but I'm quite sure I can do something better.
You could try this:
(df.assign(previous_value=df.groupby('district').value.shift()) # usual groupby.shift
.drop_duplicates(['district','curfew_name']) # drop all duplicates
[['curfew_name','previous_value']] # select the columns of interest
.dropna(subset=['curfew_name']) # ignore curfew with nan values
)
Output:
curfew_name previous_value
1 B1 10.0
3 D1 NaN
5 D2 16.0
7 F1 30.0
Getting inspiration from @Quang Hoang's answer and my initial approach I managed to do it:
df['previous_value'] = df.groupby('district').value.shift()
idx = df[df.active.eq(1)].reset_index().groupby('curfew_name').first()['index']
previous_values = df[df.index.isin(idx )].set_index('curfew_name').previous_value
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.