简体   繁体   中英

Pandas group by consecutive numbers

I'm dealing with a DataFrame like this:

n_days    probability
 0            0.01
 17           0.1
 18           0.11
 19           0.12
 40           0.2
 41           0.21

I want to group consecutive numbers and get the mean probability of each group, like this:

n_days     mean_probability
  0           0.01
 17-19        0.11
 40-41        0.205

Formatting on the n_days isn't too relevant.

I tried something like:

df['diff_days'] = df.n_days - df.n_days.shift()

And then:

df.diff_days.eq(1)

Which brings this boolean:

n_days    probability   bool_eq
 0            0.01       False
 17           0.1        False
 18           0.11       True
 19           0.12       True       
 40           0.2        False
 41           0.21       True

Which seems to be a step forward, but I'm not sure how to follow up. Each False would be the start of each group, but how would I catch the whole group? Any help would be appreciated. Thanks.

You could use pd.cut + DataFrame.groupby :

mean_probability=df.groupby(pd.cut(df.n_days,len(df)//2)).probability.mean()

n_days
(-0.041, 13.667]    0.010
(13.667, 27.333]    0.110
(27.333, 41.0]      0.205
Name: probability, dtype: float64

You can group on pd.cut bins. Note that each bin is from but excluding the first value to and including the last value, eg (16-19] is equivalent to [17-19] where the column consists of integers.

bins = [-1, 0, 16, 19, 39, 41]
>>> df.groupby(
        pd.cut(df['n_days'], bins))['probability'].mean().dropna()
n_days
(-1, 0]     0.010
(16, 19]    0.110
(39, 41]    0.205
Name: probability, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM