使用带有 IntervalIndex 的 pandas.cut 后如何重命名类别？

Question

I discretized a column in my dataframe using pandas.cut with bins created by IntervalIndex.from_tuples .我使用pandas.cut和IntervalIndex.from_tuples创建的 bin 离散化我的数据pandas.cut一列。

The cut works as intended however the categories are shown as the tuples I specified in the IntervalIndex .剪切按预期工作，但是类别显示为我在IntervalIndex指定的元组。 Is there any way to rename the categories into a different label eg (Small, Medium, Large)?有没有办法将类别重命名为不同的标签，例如（小、中、大）？

Example:例子：

bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)

The resulting categories will be:结果类别将是：

[NaN, (0, 1], NaN, (2, 3], (4, 5]]
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]

I am trying to change [(0, 1] < (2, 3] < (4, 5]] into something like 1, 2 ,3 or small, medium ,large .我正在尝试将[(0, 1] < (2, 3] < (4, 5]]更改为1, 2 ,3或small, medium ,large 。

Sadly, the labels parameter arguments of pd.cut is ignored when using IntervalIndex.遗憾的是，在使用 IntervalIndex 时，pd.cut 的标签参数参数被忽略。

Thanks!谢谢！

UPDATE:更新：

Thanks to @SergeyBushmanov I noticed that this issue only exist when trying to change category labels inside a dataframe (which is what I am trying to do).感谢@SergeyBushmanov，我注意到这个问题仅在尝试更改数据框内的类别标签时才存在（这就是我想要做的）。 Updated example:更新示例：

In [1]: df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
In [2]: bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
In [3]: df['col1'] = pd.cut(df['col1'], bins)
In [4]: df['col1'].categories = ['small','med','large']

In [5]: df['col1']

Out [5]:
0       NaN
1    (0, 1]
2       NaN
3    (2, 3]
4    (4, 5]
Name: col1, dtype: category
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]

Answer 1

If we have some data:如果我们有一些数据：

bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)

You may try re-assigning categories like :您可以尝试重新分配类别，例如：

In [7]: x.categories = [1,2,3]

In [8]: x   
Out[8]: 
[NaN, 1, NaN, 2, 3]
Categories (3, int64): [1 < 2 < 3]

or:或者：

In [9]: x.categories = ["small", "medium", "big"]                         

In [10]: x                                             
Out[10]: 
[NaN, small, NaN, medium, big]
Categories (3, object): [small < medium < big]

UPDATE :更新：

df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut(df["col1"].to_list(),bins)
x.categories = [1,2,3]
df['col1'] = x
df.col1
0    NaN
1      1
2    NaN
3      2
4      3
Name: col1, dtype: category
Categories (3, int64): [1 < 2 < 3]

UPDATE 2 :更新 2 ：

In newer versions of pandas, instead of reassigning categories using x.categories = [1, 2, 3] , x.cat.rename_categories should be used:在大熊猫的新版本，而不是使用重新分配的类别x.categories = [1, 2, 3] x.cat.rename_categories应使用：

labels = [1, 2, 3]
x = x.rename_categories(labels)

labels can be of any type, and in any case, the original categorical order that was set when creating the pd.IntervalIndex will be preserved. labels可以是任何类型，在任何情况下，创建pd.IntervalIndex时设置的原始分类顺序将被保留。

Answer 2

series = pd.Series([0, 0.5, 1.5, 2.5, 4.5])

bins = [(0, 1), (2, 3), (4, 5)]
index = pd.IntervalIndex.from_tuples(bins)
intervals = index.values
names = ['small', 'med', 'large']
to_name = {interval: name for interval, name in zip(intervals, names)}

named_series = pd.Series(
    pd.CategoricalIndex(pd.cut(series, bins_index)).rename_categories(to_name)
)
print(named_series)

0      NaN
1    small
2      NaN
3      med
4    large
dtype: category
Categories (3, object): ['small' < 'med' < 'large']

使用带有 IntervalIndex 的 pandas.cut 后如何重命名类别？

问题描述

2 个解决方案

解决方案1
10 已采纳 2019-03-17 07:01:12

解决方案2
0 2020-11-05 04:54:40

使用带有 IntervalIndex 的 pandas.cut 后如何重命名类别？

问题描述

2 个解决方案

解决方案1 10 已采纳 2019-03-17 07:01:12

解决方案2 0 2020-11-05 04:54:40

解决方案1
10 已采纳 2019-03-17 07:01:12

解决方案2
0 2020-11-05 04:54:40