简体   繁体   中英

pandas cut into intervals hyper-parameters

I am trying to match several data frames on one interval column which is a result of a pd.cut() function. However, the matching doesn`t work due to the fact that the pd.cut() produces different outcomes.

For example: While cutting a float numbers series into bins of [15, 16, 17, 18], the pd.cut function produces sometimes the following intervals - option A:

(15, 16], (16, 17], (17, 18]

and sometimes it produces with the following intervals - option B:

(15.0, 16.0], (16.0, 17.0], (17.0, 18.0]

Change of hyper-parameters such as precision don`t help. And the funny thing is that for option B result when you group-by the intervals, the grouped names are actually as option A - (15, 16], (16, 17], (17, 18]

Which hyper parameters should I use for the pd.cut() function?

Yup it works, a possible solution is just manually adding labels for the pd.cut() intervals as legend.

df['a_groups'] = pd.qcut(df.a, q=3, labels=['(15, 16]', '(16, 17]', '(17, 18]'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM