简体   繁体   中英

Grouping values in Pandas value_counts()

I want to create histogram from my pandas dataframe. I have 1 column, where I save percentage values. I used value_counts() but I have too much percentage values. Example:

0.752        1
0.769        2
0.800        1
0.823        1
          ... 
80.365       1
84.000       1
84.615       1
85.000       10
85.714       1

I need to group this values by same rate. For example 5 %. (0 - 4,999 , 5,000 - 9,999, ...) I want this result:

(Example)

0  - 4,999       24
5  - 9,999       12
10 - 14,999      30
...

you can group your data by the result of pd.cut() method:

In [38]: df
Out[38]:
    value  count
0   0.752      1
1  11.769      3
2  22.800      4
3  33.823      5
4  55.365      1
5  84.000      1
6  84.615      1
7  85.000     10
8  99.714      1

In [39]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().fillna(0)
Out[39]:
value
(0, 5]        1.0
(5, 10]       0.0
(10, 15]      3.0
(15, 20]      0.0
(20, 25]      4.0
(25, 30]      0.0
(30, 35]      5.0
(35, 40]      0.0
(40, 45]      0.0
(45, 50]      0.0
(50, 55]      0.0
(55, 60]      1.0
(60, 65]      0.0
(65, 70]      0.0
(70, 75]      0.0
(75, 80]      0.0
(80, 85]     12.0
(85, 90]      0.0
(90, 95]      0.0
(95, 100]     1.0
Name: count, dtype: float64

alternatively you can drop NaN's:

In [40]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().dropna()
Out[40]:
value
(0, 5]        1.0
(10, 15]      3.0
(20, 25]      4.0
(30, 35]      5.0
(55, 60]      1.0
(80, 85]     12.0
(95, 100]     1.0
Name: count, dtype: float64

Explanation:

In [41]: pd.cut(df.value, bins=np.linspace(0, 100, 21))
Out[41]:
0       (0, 5]
1     (10, 15]
2     (20, 25]
3     (30, 35]
4     (55, 60]
5     (80, 85]
6     (80, 85]
7     (80, 85]
8    (95, 100]
Name: value, dtype: category
Categories (20, object): [(0, 5] < (5, 10] < (10, 15] < (15, 20] ... (80, 85] < (85, 90] < (90, 95] < (95, 100]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM