[英]how to assign the number in pandas dataframe for the unique value appearing in the row based on given column
Data Frame looks like数据框看起来像
Unique Id Date
H1 2/03/2022
H1 2/03/2022
H1 2/03/2022
H1 3/03/2022
H1 4/03/2022
H2 9/03/2022
H2 9/03/2022
H2 10/03/2022
Expected Data Frame预期数据帧
Unique Id Date Count
H1 2/03/2022 1
H1 2/03/2022 1
H1 2/03/2022 1
H1 3/03/2022 2
H1 4/03/2022 3
H2 9/03/2022 1
H2 9/03/2022 1
H2 10/03/2022 2
Repetitive dates should be assigned with number 1, else other should be assigned some other number重复的日期应分配编号 1,否则应分配其他编号
tried multiple approaches, please assist尝试了多种方法,请协助
There are a bunch of ways to do this, the primary issue is going to be that you need to treat the date as a date object so that October doesn't get moved ahead of September in your second group.有很多方法可以做到这一点,主要问题是您需要将日期视为日期对象,以便在您的第二组中,十月不会提前于九月。
import pandas as pd
df = pd.DataFrame({'Unique_Id': ['H1', 'H1', 'H1', 'H1', 'H1', 'H2', 'H2', 'H2'],
'Date': ['2/03/2022',
'2/03/2022',
'2/03/2022',
'3/03/2022',
'4/03/2022',
'9/03/2022',
'9/03/2022',
'10/03/2022']})
Dense Rank密集等级
df.groupby('Unique_Id')['Date'].apply(lambda x: pd.to_datetime(x).rank(method='dense'))
Cat Codes猫代码
df.groupby('Unique_Id')['Date'].apply(lambda x: pd.to_datetime(x).astype('category').cat.codes+1)
Factorize分解
df.groupby('Unique_Id')['Date'].transform(lambda x: x.factorize()[0] + 1)
here is one way to do it making use of groupby and transform这是使用 groupby 和 transform 的一种方法
" Repetitive dates should be assigned with number 1 , else other should be assigned some other number " is what the question stated, so I choose 2 where the values are unique “应为重复日期分配数字 1 ,否则应为其他日期分配其他数字”是问题所述,所以我选择 2 值是唯一的
df['count'] = df.groupby('Date').transform(lambda x: 1 if (x.size > 1) else 2 )
df
Unique_Id Date count
0 H1 2/03/2022 1
1 H1 2/03/2022 1
2 H1 2/03/2022 1
3 H1 3/03/2022 2
4 H1 4/03/2022 2
5 H2 9/03/2022 1
6 H2 9/03/2022 1
7 H2 10/03/2022 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.