简体   繁体   中英

How to print categories in pandas.cut?

Notice that when you input pandas.cut into a dataframe, you get the bins of each element, Name:, Length:, dtype:, and Categories in the output. I just want the Categories array printed for me so I can obtain just the range of the number of bins I was looking for . For example, with bins=4 inputted into a dataframe of numbers "1,2,3,4,5", I would want the output to print solely the range of the four bins, ie (1, 2], (2, 3], (3, 4], (4, 5].

Is there anyway I can do this? It can be anything, even if it doesn't require printing "Categories".

I guessed that you just would like to get the 'bins' from pd.cut() . If so, you can simply set retbins=True , see the doc of pd.cut For example:

In[01]:

data = pd.DataFrame({'a': [1, 2, 3, 4, 5]})
cats, bins = pd.cut(data.a, 4, retbins=True)

Out[01]:

cats :

0    (0.996, 2.0]
1    (0.996, 2.0]
2      (2.0, 3.0]
3      (3.0, 4.0]
4      (4.0, 5.0]
Name: a, dtype: category
Categories (4, interval[float64]): [(0.996, 2.0] < (2.0, 3.0] < (3.0, 4.0] < (4.0, 5.0]]

bins :

array([0.996, 2.   , 3.   , 4.   , 5.   ])

Then you can reuse the bins as you pleased. eg,

lst = [1, 2, 3]
category = pd.cut(lst,bins)

For anyone who has come here to see how to select a particular bin from pd.cut function - we can use the pd.Interval funtcion

df['bin'] = pd.cut(df['y'], [0.1, .2,.3,.4,.5, .6,.7,.8 ,.9])
print(df["bin"].value_counts())

Ouput
(0.2, 0.3]    697
(0.4, 0.5]    156
(0.5, 0.6]    122
(0.3, 0.4]     12
(0.6, 0.7]      8
(0.7, 0.8]      4
(0.1, 0.2]      0
(0.8, 0.9]      0
print(df.loc[df['bin'] ==  pd.Interval(0.7,0.8)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM