简体   繁体   中英

How do I define (and name) intervals for the pandas.cut function?

I want to use the pandas.cut() function in combination with defined intervals to sort given data in these intervals. I Also would like to give these interval names like: small, moderate and high. I have tried to do this with the following code:

import pandas as pd

CO_simplified = pd.IntervalIndex.from_tuples([(0, 200), (200,250 ), (300, 1000)]) #small,moderate,high
df_dtc_test= pd.DataFrame()
df_dtc_test["CO_simp"] = pd.cut([122,232,333,324,533], len(CO_simplified), labels=CO_simplified)
print(df_dtc_test)

With output:

       CO_simp
0     (0, 200]
1     (0, 200]
2   (200, 250]
3   (200, 250]
4  (300, 1000]

But this is not what I expected, the first index number seems right to me but the second index number is also ordered in the group (0,200) but the given value for the second index is 232 which is outside this interval. Beside the false ordering I would like to replace for example (0, 200) with "small".

EDIT: My problem is partially solved (see below), my only concern is how I can replace the intervals with names.

Does anyone know how I can do this properly?

For ordering the right way:

Use

df_dtc_test["CO_simp"] = pd.cut([122,232,333,324,533], CO_simplified, labels=CO_simplified)

instead of

df_dtc_test["CO_simp"] = pd.cut([122,232,333,324,533], len(CO_simplified), labels=CO_simplified)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM