[英]Python : How to assign ranks to categorical variables within a group in Python
Given I have a dataset containing only the first two columns, how do I create another column using Python which will contain the rank based on these ranges for each group separately.鉴于我有一个仅包含前两列的数据集,我如何使用 Python 创建另一列,该列将分别包含基于每个组的这些范围的排名。 My desired output would look like this -
我想要的输出看起来像这样 -
id ![]() |
range![]() |
rank![]() |
---|---|---|
1 ![]() |
10-20 ![]() |
2 ![]() |
1 ![]() |
20-30 ![]() |
3 ![]() |
1 ![]() |
5-10 ![]() |
1 ![]() |
2 ![]() |
20-30 ![]() |
2 ![]() |
2 ![]() |
10-20 ![]() |
1 ![]() |
2 ![]() |
||
3 ![]() |
10-20 ![]() |
2 ![]() |
3 ![]() |
5-10 ![]() |
1 ![]() |
3 ![]() |
20-30 ![]() |
3 ![]() |
3 ![]() |
30+ ![]() |
4 ![]() |
NOTE - These are the only 4 ranges [5-10, 10-20, 20-30, 30+] that can belong to any id at max.注意 - 这些是仅有的 4 个范围 [5-10, 10-20, 20-30, 30+] 最多可以属于任何 id。 There can be blanks as well For example as given in the reproducible example, if for id 2 there are two ranges 10-20 and 20-30 the corresponding to 10-20 the rank will be 1 and corresponding to 20-30 the rank will be 2. I have checked that df.groupby can be used but I am not being able to figure out how in this case.
也可以有空格例如在可重复的示例中给出,如果 id 2 有两个范围 10-20 和 20-30,则对应于 10-20 的等级将为 1,对应于 20-30 的等级将为是 2. 我已经检查过 df.groupby 可以使用,但我无法弄清楚在这种情况下如何使用。
Convert your range column to a category dtype before apply rank
:在应用
rank
之前将范围列转换为类别数据类型:
df['range'] = df['range'].astype(pd.CategoricalDtype(
['5-10', '10-20', '20-30', '30+'], ordered=True))
df['rank'] = df.groupby('id')['range'].apply(lambda x: x.rank())
>>> df
id range rank
0 1 10-20 2.0
1 1 20-30 3.0
2 1 5-10 1.0
3 2 20-30 2.0
4 2 10-20 1.0
5 2 NaN NaN
6 3 10-20 2.0
7 3 5-10 1.0
8 3 20-30 3.0
9 3 30+ 4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.