在 Pandas 中创建一个基于列的连接名称和排名

Question

I have this dataset, which has names and counts:我有这个数据集，它有名称和计数：

df = pd.DataFrame({'Id':[1,2,3,4,5,6], 'Name':['Eve','Diana','Diana','Mia','Eve','Eve'], "Count":[10,3,14,8,5,2]})
df

    Id  Name    Count
0   1   Eve     10
1   2   Diana   3
2   3   Diana   14
3   4   Mia     8
4   5   Eve     5
5   6   Eve     2

And I want to create a new column which is the concatenation of the name plus the ranking.我想创建一个新列，它是名称和排名的串联。 So first I have to select those non-unique values and order them:所以首先我必须选择那些非唯一值并对它们进行排序：

df_nounique = df[df.duplicated(subset=['Name'], keep=False)]
df_nounique = df_nounique.sort_values(by=['Name','Count'], ascending=False)
df_nounique
    Id  Name    Count
0   1   Eve    10
4   5   Eve    5
5   6   Eve    2
2   3   Diana  14
1   2   Diana  3

Ok, now I have to assign the ranking based on the name and count:好的，现在我必须根据名称和数量分配排名：

df_nounique['rank'] = df_nounique.groupby('Name')['Count'].rank()
df_nounique
    Id  Name    Count   rank
0   1   Eve     10      3.0
4   5   Eve     5       2.0
5   6   Eve     2       1.0
2   3   Diana   14      2.0
1   2   Diana   3       1.0

But this is where I am stuck.但这就是我被困的地方。 For the first row the rank should be 1, but I get 3!.对于第一行，排名应该是 1，但我得到 3！。 If I get this right, I can merge an concatenate to obtain this:如果我做对了，我可以合并一个连接来获得这个：

    Id  Name    Count   New_col
0   1   Eve     10      Eve_1
1   2   Diana   3       Diana_2
2   3   Diana   14      Diana_1
3   4   Mia     8       Mia
4   5   Eve     5       Eve_2
5   6   Eve     2       Eve_3

It seems that I am taking too much steps so please, could you help me at least with my rank problem, and a suggestion to a better approach for my ultimate goal?看来我采取的步骤太多了，所以请您至少帮助我解决我的排名问题，并为我的最终目标提供更好的方法建议？

Answer 1

Use ascending=False as argument of rank() :使用ascending=False作为rank()参数：

df_nounique['rank'] = df_nounique.groupby('Name')['Count'] \
                                 .rank(ascending=False).astype(int)

>>> df_nounique
   Id   Name  Count  rank
0   1    Eve     10     1
4   5    Eve      5     2
5   6    Eve      2     3
2   3  Diana     14     1
1   2  Diana      3     2

Then:然后：

df['New_col'] = (df_nounique['Name'] + '_' + df_nounique['rank'].astype(str)) \
                    .combine_first(df['Name'])

>>> df
   Id   Name  Count  New_col
0   1    Eve     10    Eve_1
1   2  Diana      3  Diana_2
2   3  Diana     14  Diana_1
3   4    Mia      8      Mia
4   5    Eve      5    Eve_2
5   6    Eve      2    Eve_3

Answer 2

We can also create the series directly from df without needing df_nounique by:我们还可以通过以下方式直接从df创建系列，而无需df_nounique ：

Generating the Series from groupby rank (with ascending=False and method='dense' to ensure whole number steps)从groupby rank生成系列（使用ascending=False和method='dense'以确保整数步数）
Using fillna to fill missing values Name使用fillna填充缺失值Name
join back to the DataFrame. join回 DataFrame。 ( Series.rename is needed to assign the new column name as join only works with named Series): （需要Series.rename来分配新列名，因为join仅适用于命名系列）：

df = df.join(
    (df['Name'] + '_' + df[df.duplicated(subset=['Name'], keep=False)]
     .groupby('Name')['Count']
     .rank(ascending=False, method='dense')
     .map('{:.0f}'.format)).fillna(df['Name']).rename('New_col')
)

df : df ：

   Id   Name  Count  New_col
0   1    Eve     10    Eve_1
1   2  Diana      3  Diana_2
2   3  Diana     14  Diana_1
3   4    Mia      8      Mia
4   5    Eve      5    Eve_2
5   6    Eve      2    Eve_3

Answer 3

although answer is already chosen, this code is , i think, not bad... take a look虽然已经选择了答案，但我认为这段代码还不错......看看

# module

import pandas as pd
import numpy as np

# make a dataset

df = pd.DataFrame({'Id':[1,2,3,4,5,6], 'Name':['Eve','Diana','Diana','Mia','Eve','Eve'], "Count":[10,3,14,8,5,2]})
print(df)


# rank and make new column

df['rank']=df.groupby('Name')['Count'].rank(ascending=False).astype('str') #rank
df.loc[~(df.duplicated(subset=['Name'], keep=False)),'rank']=np.nan # replace rank null if value of name column is unique
df.loc[~(df['rank'].isna()),'New_col'], df.loc[(df['rank'].isna()),'New_col']  = (df['Name'] + '_' + df['rank']),(df['Name'])
print(df)

在 Pandas 中创建一个基于列的连接名称和排名

问题描述

3 个解决方案

解决方案1
2 已采纳 2021-07-26 04:26:30

解决方案2
1 2021-07-26 04:40:35

解决方案3
1 2021-07-26 04:55:42

在 Pandas 中创建一个基于列的连接名称和排名

问题描述

3 个解决方案

解决方案1 2 已采纳 2021-07-26 04:26:30

解决方案2 1 2021-07-26 04:40:35

解决方案3 1 2021-07-26 04:55:42

解决方案1
2 已采纳 2021-07-26 04:26:30

解决方案2
1 2021-07-26 04:40:35

解决方案3
1 2021-07-26 04:55:42