I'm dealing with a DataFrame like this:
n_days probability
0 0.01
17 0.1
18 0.11
19 0.12
40 0.2
41 0.21
I want to group consecutive numbers and get the mean probability of each group, like this:
n_days mean_probability
0 0.01
17-19 0.11
40-41 0.205
Formatting on the n_days
isn't too relevant.
I tried something like:
df['diff_days'] = df.n_days - df.n_days.shift()
And then:
df.diff_days.eq(1)
Which brings this boolean:
n_days probability bool_eq
0 0.01 False
17 0.1 False
18 0.11 True
19 0.12 True
40 0.2 False
41 0.21 True
Which seems to be a step forward, but I'm not sure how to follow up. Each False
would be the start of each group, but how would I catch the whole group? Any help would be appreciated. Thanks.
You could use pd.cut
+ DataFrame.groupby
:
mean_probability=df.groupby(pd.cut(df.n_days,len(df)//2)).probability.mean()
n_days
(-0.041, 13.667] 0.010
(13.667, 27.333] 0.110
(27.333, 41.0] 0.205
Name: probability, dtype: float64
You can group on pd.cut
bins. Note that each bin is from but excluding the first value to and including the last value, eg (16-19] is equivalent to [17-19] where the column consists of integers.
bins = [-1, 0, 16, 19, 39, 41]
>>> df.groupby(
pd.cut(df['n_days'], bins))['probability'].mean().dropna()
n_days
(-1, 0] 0.010
(16, 19] 0.110
(39, 41] 0.205
Name: probability, dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.