用pandas.cut绑定所有值

Question

I have a dataframe that looks like the following: 我有一个数据框，如下所示：

index  value
1      21.046091
2      52.400000
3      14.082153
4      1.859942
5      1.859942
6      2.331143
7      9.060000
8      0.789265
9      12967.7

The last value is much higher than the rest. 最后一个值比其他值高得多。 I'm trying to bin all the values into 5 bins using pd.cut: 我正在尝试使用pd.cut将所有值合并到5个容器中：

pd.cut(df['value'], 5, labels = [1,2,3,4,5])

But it only ends up returning the groups 1 and 5. 但这只会返回第1组和第5组。

index   value   group
0       0.410000    1
1       21.046091   1
2       52.400000   1
3       14.082153   1
4       1.859942    1
5       1.859942    1
6       2.331143    1
7       9.060000    1
8       0.789265    1
9       12967.7     5

The higher value is clearly throwing it, but is there a way to ensure that all five bins are represented in the dataframe without getting rid of outlying values? 较高的值显然会抛出该值，但是有没有办法确保所有五个bin都在数据帧中表示出来而又不会脱离外围值？

Answer 1

You could use qcut : 您可以使用qcut ：

pd.qcut(df['value'],5,labels=[1,2,3,4,5])

Output: 输出：

index
1    4
2    5
3    4
4    1
5    1
6    2
7    3
8    1
9    5
Name: value, dtype: category
Categories (5, int64): [1 < 2 < 3 < 4 < 5]

print(df.assign(group = pd.qcut(df['value'],5,labels=[1,2,3,4,5])))

              value group
index                    
1         21.046091     4
2         52.400000     5
3         14.082153     4
4          1.859942     1
5          1.859942     1
6          2.331143     2
7          9.060000     3
8          0.789265     1
9      12967.700000     5

用pandas.cut绑定所有值

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-03-09 22:04:33

用pandas.cut绑定所有值

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-03-09 22:04:33

解决方案1
2 已采纳 2018-03-09 22:04:33