[英]Binning all values with pandas.cut
I have a dataframe that looks like the following: 我有一个数据框,如下所示:
index value
1 21.046091
2 52.400000
3 14.082153
4 1.859942
5 1.859942
6 2.331143
7 9.060000
8 0.789265
9 12967.7
The last value is much higher than the rest. 最后一个值比其他值高得多。 I'm trying to bin all the values into 5 bins using pd.cut:
我正在尝试使用pd.cut将所有值合并到5个容器中:
pd.cut(df['value'], 5, labels = [1,2,3,4,5])
But it only ends up returning the groups 1 and 5. 但这只会返回第1组和第5组。
index value group
0 0.410000 1
1 21.046091 1
2 52.400000 1
3 14.082153 1
4 1.859942 1
5 1.859942 1
6 2.331143 1
7 9.060000 1
8 0.789265 1
9 12967.7 5
The higher value is clearly throwing it, but is there a way to ensure that all five bins are represented in the dataframe without getting rid of outlying values? 较高的值显然会抛出该值,但是有没有办法确保所有五个bin都在数据帧中表示出来而又不会脱离外围值?
You could use qcut
: 您可以使用
qcut
:
pd.qcut(df['value'],5,labels=[1,2,3,4,5])
Output: 输出:
index
1 4
2 5
3 4
4 1
5 1
6 2
7 3
8 1
9 5
Name: value, dtype: category
Categories (5, int64): [1 < 2 < 3 < 4 < 5]
print(df.assign(group = pd.qcut(df['value'],5,labels=[1,2,3,4,5])))
value group
index
1 21.046091 4
2 52.400000 5
3 14.082153 4
4 1.859942 1
5 1.859942 1
6 2.331143 2
7 9.060000 3
8 0.789265 1
9 12967.700000 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.