I'm trying to split a series into buckets of almost the same size keeping the order and without having same items in different buckets.
I'm using qcut like this:
>>> import pandas as pd
>>> pd.__version__
'0.20.3'
>>> x = [1,1,1,1,1,2,2,2,2,3,4]
>>> pd.qcut(x, 10, duplicates='drop').value_counts()
(0.999, 2.0] 9
(2.0, 3.0] 1
(3.0, 4.0] 1
dtype: int64
I was expecting this to split the first bucket into (0.999, 1.0]
, (1.0, 2.0]
.
Why not? Any other approach I should try?
By using cut
specific your own interval
pd.cut(x, [0.999,1,2]).value_counts()
Out[242]:
(0.999, 1.0] 5
(1.0, 2.0] 4
dtype: int64
Try pd.cut option like below :
pd.cut(x, 3).value_counts()
(0.997, 2.0] 9
(2.0, 3.0] 1
(3.0, 4.0] 1
Play around with the number of bins you provide.Here I have provided 3 bins. So it had splitted into (0.997,2), (2,3), (3,4).
If you want the bin value to be specified by you then mention the bin values manually like below :
bins = [0.999, 1.0, 2.0, 3.0, 4.0]
pd.cut(x, bins).value_counts()
(0.999, 1.0] 5
(1.0, 2.0] 4
(2.0, 3.0] 1
(3.0, 4.0] 1
Hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.