[英]Grouping values in Pandas value_counts()
I want to create histogram from my pandas dataframe.我想从我的熊猫数据帧创建直方图。 I have 1 column, where I save percentage values.我有 1 列,用于保存百分比值。 I used value_counts() but I have too much percentage values.我使用了 value_counts() 但我有太多的百分比值。 Example:例子:
0.752 1
0.769 2
0.800 1
0.823 1
...
80.365 1
84.000 1
84.615 1
85.000 10
85.714 1
I need to group this values by same rate.我需要按相同的速率对这些值进行分组。 For example 5 %.例如 5%。 (0 - 4,999 , 5,000 - 9,999, ...) I want this result: (0 - 4,999 , 5,000 - 9,999, ...) 我想要这个结果:
(Example) (例子)
0 - 4,999 24
5 - 9,999 12
10 - 14,999 30
...
you can group your data by the result of pd.cut() method:您可以通过pd.cut()方法的结果对数据进行分组:
In [38]: df
Out[38]:
value count
0 0.752 1
1 11.769 3
2 22.800 4
3 33.823 5
4 55.365 1
5 84.000 1
6 84.615 1
7 85.000 10
8 99.714 1
In [39]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().fillna(0)
Out[39]:
value
(0, 5] 1.0
(5, 10] 0.0
(10, 15] 3.0
(15, 20] 0.0
(20, 25] 4.0
(25, 30] 0.0
(30, 35] 5.0
(35, 40] 0.0
(40, 45] 0.0
(45, 50] 0.0
(50, 55] 0.0
(55, 60] 1.0
(60, 65] 0.0
(65, 70] 0.0
(70, 75] 0.0
(75, 80] 0.0
(80, 85] 12.0
(85, 90] 0.0
(90, 95] 0.0
(95, 100] 1.0
Name: count, dtype: float64
alternatively you can drop NaN's:或者,您可以删除 NaN:
In [40]: df.groupby(pd.cut(df.value, bins=np.linspace(0, 100, 21)))['count'].sum().dropna()
Out[40]:
value
(0, 5] 1.0
(10, 15] 3.0
(20, 25] 4.0
(30, 35] 5.0
(55, 60] 1.0
(80, 85] 12.0
(95, 100] 1.0
Name: count, dtype: float64
Explanation:解释:
In [41]: pd.cut(df.value, bins=np.linspace(0, 100, 21))
Out[41]:
0 (0, 5]
1 (10, 15]
2 (20, 25]
3 (30, 35]
4 (55, 60]
5 (80, 85]
6 (80, 85]
7 (80, 85]
8 (95, 100]
Name: value, dtype: category
Categories (20, object): [(0, 5] < (5, 10] < (10, 15] < (15, 20] ... (80, 85] < (85, 90] < (90, 95] < (95, 100]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.