简体   繁体   中英

Sort data ranges with pandas.cut

I try to understand how to create a table of data I have divided into bins using pandas.cut where the data ranges are in the right order. Using the following code to generate random ages:

import numpy as np
import pandas as pd
ages = np.random.standard_normal(1000)*20+30
ages[ages<0]=0
ages[ages>120]=120

I bin the data using this line:

ages = pd.Series(ages, dtype=int)
ages_cut = pd.cut(ages,[0,20,40,60,80,100,120])

However, when I use ages_cut.value_counts() I get a table with the age ranges in a wrong order:

(20, 40]      379
(0, 20]       268
(40, 60]      233
(60, 80]       56
(80, 100]       3
(100, 120]      0
dtype: int64

In addition of the comment of @QuangHoang, you can use value_counts with a bins parameter:

bins : int , optional

Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.

>>> ages.value_counts(bins=[0,20,40,60,80,100,120], sort=False)
(-0.001, 20.0]    334
(20.0, 40.0]      382
(40.0, 60.0]      224
(60.0, 80.0]       54
(80.0, 100.0]       6
(100.0, 120.0]      0
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM