简体   繁体   中英

Binning data into equally sized bins

I would like to bin values into equally sized bins. Let's assume that we have the following Pandas Series:

ex = pd.Series([1,2,3,4,5,6,7,888,999])

Now, I would like to create three bins:

pd.cut(ex, 3, labels=False)

This results in three bins and the following bin number assigned to each element of the series:

[0,0,0,0,0,0,0,2,2]

Now, I would like to have the bin borders such that each bin has equal number of elements (ie 3) and the assigment of the data points to the bins should look like:

[0,0,0,1,1,1,2,2,2]

How can I avhieve this? And what should be done for tie breaking (ie when the number of data points is not divisble by the number of bins)?

Use -

pd.qcut(ex, 3, labels=False)

Output

0    0
1    0
2    0
3    1
4    1
5    1
6    2
7    2
8    2

Use retbins=True for getting the bins.

pd.qcut(ex, 3, labels=False, retbins=True)

Output

(0    0
 1    0
 2    0
 3    1
 4    1
 5    1
 6    2
 7    2
 8    2
 dtype: int64,
 array([  1.        ,   3.66666667,   6.33333333, 999.        ]))

Use pandas qcut function instead. Try this pd.qcut(ex,q=3,labels=False)

Try with

bins = ex.index//3 # np.arange(len(ex))//3
bins
Out[98]: Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype='int64')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM