I have a table like this:
import pandas as pd
df = pd.DataFrame({
"day": [1, 2, 3, 4, 5, 6],
"tmin": [-2, -3, -1, -4, -4, -2]
})
I want to create a column like this:
df['days_under_0_until_now'] = [1, 2, 3, 4, 5, 6]
df['days_under_-2_until_now'] = [1, 2, 0, 1, 2, 3]
df['days_under_-3_until_now'] = [0, 1, 0, 1, 2, 0]
So days_under_X_until_now
means how many consecutive days until now tmin
was under or equals X
I'd like to avoid do this with loops since the data is huge. Is there an alternative way to do it?
For improve performance avoid using groupby
compare values of column to list and then use this solution for count consecutive True
s:
vals = [0,-2,-3]
arr = df['tmin'].to_numpy()[:, None] <= np.array(vals)[ None, :]
cols = [f'days_under_{v}_until_now' for v in vals]
df1 = pd.DataFrame(arr, columns=cols, index=df.index)
b = df1.cumsum()
df = df.join(b.sub(b.mask(df1).ffill().fillna(0)).astype(int))
print (df)
day tmin days_under_0_until_now days_under_-2_until_now \
0 1 -2 1 1
1 2 -3 2 2
2 3 -1 3 0
3 4 -4 4 1
4 5 -4 5 2
5 6 -2 6 3
days_under_-3_until_now
0 0
1 1
2 0
3 1
4 2
5 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.