Python：如何在 Python 中为组内的分类变量分配等级

Question

Given I have a dataset containing only the first two columns, how do I create another column using Python which will contain the rank based on these ranges for each group separately.鉴于我有一个仅包含前两列的数据集，我如何使用 Python 创建另一列，该列将分别包含基于每个组的这些范围的排名。 My desired output would look like this -我想要的输出看起来像这样 -

id ID	range范围	rank秩
1 1	10-20 10-20	2 2
1 1	20-30 20-30	3 3
1 1	5-10 5-10	1 1
2 2	20-30 20-30	2 2
2 2	10-20 10-20	1 1
2 2
3 3	10-20 10-20	2 2
3 3	5-10 5-10	1 1
3 3	20-30 20-30	3 3
3 3	30+ 30+	4 4

NOTE - These are the only 4 ranges [5-10, 10-20, 20-30, 30+] that can belong to any id at max.注意 - 这些是仅有的 4 个范围 [5-10, 10-20, 20-30, 30+] 最多可以属于任何 id。 There can be blanks as well For example as given in the reproducible example, if for id 2 there are two ranges 10-20 and 20-30 the corresponding to 10-20 the rank will be 1 and corresponding to 20-30 the rank will be 2. I have checked that df.groupby can be used but I am not being able to figure out how in this case.也可以有空格例如在可重复的示例中给出，如果 id 2 有两个范围 10-20 和 20-30，则对应于 10-20 的等级将为 1，对应于 20-30 的等级将为是 2. 我已经检查过 df.groupby 可以使用，但我无法弄清楚在这种情况下如何使用。

Answer 1

Convert your range column to a category dtype before apply rank :在应用rank之前将范围列转换为类别数据类型：

df['range'] = df['range'].astype(pd.CategoricalDtype(
                  ['5-10', '10-20', '20-30', '30+'], ordered=True))

df['rank'] = df.groupby('id')['range'].apply(lambda x: x.rank())

>>> df
   id  range  rank
0   1  10-20   2.0
1   1  20-30   3.0
2   1   5-10   1.0
3   2  20-30   2.0
4   2  10-20   1.0
5   2    NaN   NaN
6   3  10-20   2.0
7   3   5-10   1.0
8   3  20-30   3.0
9   3    30+   4.0

Python：如何在 Python 中为组内的分类变量分配等级

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-07-06 10:24:35

Python：如何在 Python 中为组内的分类变量分配等级

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-07-06 10:24:35

解决方案1
2 已采纳 2021-07-06 10:24:35