简体   繁体   中英

Count how many consecutive rows meet a condition with pandas

I have a table like this:

import pandas as pd
df = pd.DataFrame({
    "day": [1, 2, 3, 4, 5, 6],
    "tmin": [-2, -3, -1, -4, -4, -2]
})

I want to create a column like this:

df['days_under_0_until_now'] = [1, 2, 3, 4, 5, 6]
df['days_under_-2_until_now'] = [1, 2, 0, 1, 2, 3]
df['days_under_-3_until_now'] = [0, 1, 0, 1, 2, 0]

So days_under_X_until_now means how many consecutive days until now tmin was under or equals X

I'd like to avoid do this with loops since the data is huge. Is there an alternative way to do it?

For improve performance avoid using groupby compare values of column to list and then use this solution for count consecutive True s:

vals = [0,-2,-3]

arr = df['tmin'].to_numpy()[:, None] <= np.array(vals)[ None, :]
cols = [f'days_under_{v}_until_now' for v in vals]
df1 = pd.DataFrame(arr, columns=cols, index=df.index)

b = df1.cumsum()
df = df.join(b.sub(b.mask(df1).ffill().fillna(0)).astype(int))
print (df)
   day  tmin  days_under_0_until_now  days_under_-2_until_now  \
0    1    -2                       1                        1   
1    2    -3                       2                        2   
2    3    -1                       3                        0   
3    4    -4                       4                        1   
4    5    -4                       5                        2   
5    6    -2                       6                        3   

   days_under_-3_until_now  
0                        0  
1                        1  
2                        0  
3                        1  
4                        2  
5                        0  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM