简体   繁体   English

用pandas.cut绑定所有值

[英]Binning all values with pandas.cut

I have a dataframe that looks like the following: 我有一个数据框,如下所示:

index  value
1      21.046091
2      52.400000
3      14.082153
4      1.859942
5      1.859942
6      2.331143
7      9.060000
8      0.789265
9      12967.7

The last value is much higher than the rest. 最后一个值比其他值高得多。 I'm trying to bin all the values into 5 bins using pd.cut: 我正在尝试使用pd.cut将所有值合并到5个容器中:

pd.cut(df['value'], 5, labels = [1,2,3,4,5])

But it only ends up returning the groups 1 and 5. 但这只会返回第1组和第5组。

index   value   group
0       0.410000    1
1       21.046091   1
2       52.400000   1
3       14.082153   1
4       1.859942    1
5       1.859942    1
6       2.331143    1
7       9.060000    1
8       0.789265    1
9       12967.7     5

The higher value is clearly throwing it, but is there a way to ensure that all five bins are represented in the dataframe without getting rid of outlying values? 较高的值显然会抛出该值,但是有没有办法确保所有五个bin都在数据帧中表示出来而又不会脱离外围值?

You could use qcut : 您可以使用qcut

pd.qcut(df['value'],5,labels=[1,2,3,4,5])

Output: 输出:

index
1    4
2    5
3    4
4    1
5    1
6    2
7    3
8    1
9    5
Name: value, dtype: category
Categories (5, int64): [1 < 2 < 3 < 4 < 5]

print(df.assign(group = pd.qcut(df['value'],5,labels=[1,2,3,4,5])))

              value group
index                    
1         21.046091     4
2         52.400000     5
3         14.082153     4
4          1.859942     1
5          1.859942     1
6          2.331143     2
7          9.060000     3
8          0.789265     1
9      12967.700000     5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM