简体   繁体   中英

Get previous row value by group after condition is met

I have the following problem. This is my dataframe:

district    curfew_name        active   value    date
  A            np.nan            0       10       1
  A             B1               1       20       4
  A             B1               1       21       6
  C             D1               1       14       8      
  C             D1               1       16       11
  C             D2               1       14       13
  E             F1               0       30       10
  E             F1               1       14       12

So, each row is a date (2-3 days between each row), in which district might have a curfew activated. So I want to know for each curfew, what was the value column's value for that district the date before the first activation of said curfew. So, in this case, curfew B1 gets activated on date 4 , so I check the previous value for that district and it's 10. For curfew D1 I don't know what's the previous value for that district, so I would get a nan . For D2 the previous value is D1 's last value: 16. Finally, for F1 we see it was announced beforehand, so we get a 0 before it is active. The value would be 30, anyways. So, my final Series would look like this:

curfew_name    previous_value
    B1              10
    D1             np.nan
    D2              16
    F1              30

So, I can get each curfew's first appearance like this:

df[df.active.eq(1)].reset_index().groupby('curfew_name').first()['index']

And then I simply tried substracting one, and then extracting those indexes:

idx = df[df.active.eq(1)].reset_index().groupby('curfew_name').first()['index'] - 1

But for cases like D1 this would get me a 21 which is a value from another district. How would you go about it? I've tried some combinations of groupby('district') , shift() , eq() , but I'm still not making it in an efficient way.

Thanks !

Edit: my approach for now would be to get the previous index, then check if the row associated with that index is in the same district than the original index and filter those when that condition is met, but I'm quite sure I can do something better.

You could try this:

(df.assign(previous_value=df.groupby('district').value.shift())  # usual groupby.shift
   .drop_duplicates(['district','curfew_name'])                  # drop all duplicates
  [['curfew_name','previous_value']]                             # select the columns of interest
   .dropna(subset=['curfew_name'])                               # ignore curfew with nan values
)

Output:

  curfew_name  previous_value
1          B1            10.0
3          D1             NaN
5          D2            16.0
7          F1            30.0

Getting inspiration from @Quang Hoang's answer and my initial approach I managed to do it:

df['previous_value'] = df.groupby('district').value.shift()
idx = df[df.active.eq(1)].reset_index().groupby('curfew_name').first()['index']
previous_values = df[df.index.isin(idx )].set_index('curfew_name').previous_value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM