Pandas value_counts() 中的分组值

Question

I want to create histogram from my pandas dataframe.我想从我的熊猫数据帧创建直方图。 I have 1 column, where I save percentage values.我有 1 列，用于保存百分比值。 I used value_counts() but I have too much percentage values.我使用了 value_counts() 但我有太多的百分比值。 Example:例子：

0.752        1
0.769        2
0.800        1
0.823        1
          ... 
80.365       1
84.000       1
84.615       1
85.000       10
85.714       1

I need to group this values by same rate.我需要按相同的速率对这些值进行分组。 For example 5 %.例如 5%。 (0 - 4,999 , 5,000 - 9,999, ...) I want this result: (0 - 4,999 , 5,000 - 9,999, ...) 我想要这个结果：

(Example) （例子）

0  - 4,999       24
5  - 9,999       12
10 - 14,999      30
...

Answer 1

you can group your data by the result of pd.cut() method:您可以通过pd.cut()方法的结果对数据进行分组：

In [38]: df
Out[38]:
    value  count
0   0.752      1
1  11.769      3
2  22.800      4
3  33.823      5
4  55.365      1
5  84.000      1
6  84.615      1
7  85.000     10
8  99.714      1

In [39]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().fillna(0)
Out[39]:
value
(0, 5]        1.0
(5, 10]       0.0
(10, 15]      3.0
(15, 20]      0.0
(20, 25]      4.0
(25, 30]      0.0
(30, 35]      5.0
(35, 40]      0.0
(40, 45]      0.0
(45, 50]      0.0
(50, 55]      0.0
(55, 60]      1.0
(60, 65]      0.0
(65, 70]      0.0
(70, 75]      0.0
(75, 80]      0.0
(80, 85]     12.0
(85, 90]      0.0
(90, 95]      0.0
(95, 100]     1.0
Name: count, dtype: float64

alternatively you can drop NaN's:或者，您可以删除 NaN：

In [40]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().dropna()
Out[40]:
value
(0, 5]        1.0
(10, 15]      3.0
(20, 25]      4.0
(30, 35]      5.0
(55, 60]      1.0
(80, 85]     12.0
(95, 100]     1.0
Name: count, dtype: float64

Explanation:解释：

In [41]: pd.cut(df.value, bins=np.linspace(0, 100, 21))
Out[41]:
0       (0, 5]
1     (10, 15]
2     (20, 25]
3     (30, 35]
4     (55, 60]
5     (80, 85]
6     (80, 85]
7     (80, 85]
8    (95, 100]
Name: value, dtype: category
Categories (20, object): [(0, 5] < (5, 10] < (10, 15] < (15, 20] ... (80, 85] < (85, 90] < (90, 95] < (95, 100]]

Pandas value_counts() 中的分组值

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-10-10 16:09:44

Pandas value_counts() 中的分组值

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-10-10 16:09:44

解决方案1
1 已采纳 2016-10-10 16:09:44