繁体   English   中英

Python:Pandas 如何将一列添加到按升序排列的数据帧的重复值中?

[英]Python: Pandas how to add a column to duplicated values of dataframe which are in ascending order?

我有以下数据框:

name  date
test   2022-03-04
test   2022-03-05
test   2022-03-06
test   2022-03-17
test   2022-03-18
test   2022-03-21
test2  2022-03-04
test2  2022-03-05
test2  2022-03-15
test2  2022-03-19
test2  2022-03-21
test2  2022-04-16
test3  2022-03-14
test3  2022-03-15
test3  2022-03-23
test3  2022-03-27
test4  2022-03-20
test4  2022-04-15
test4  2022-04-17
test5  2022-03-01
test5  2022-03-04
test5  2022-03-06
test5  2022-03-12
test5  2022-04-04
test5  2022-04-10
test5  2022-04-14
test5  2022-05-04
test6  2022-03-05
test6  2022-03-15
test6  2022-06-20
test6  2022-06-24
test6  2022-06-27

如何为重复的旧记录添加一个值为 yes 的列old_data组合(名称,日期),其数据至少大于 3 个值? 日期列按升序排列。

我想产生这个输出:

name  date           old_data
test    2022-03-04  yes
test    2022-03-05  yes
test    2022-03-06  yes
test    2022-03-17  
test    2022-03-18  
test    2022-03-21  
test2   2022-03-04  yes
test2   2022-03-05  yes
test2   2022-03-15  yes
test2   2022-03-19  
test2   2022-03-21
test2   2022-04-16  
test3   2022-03-14  yes
test3   2022-03-15  
test3   2022-03-23  
test3   2022-03-27  
test4   2022-03-20  
test4   2022-04-15  
test4   2022-04-17  
test5   2022-03-01  yes
test5   2022-03-04  yes
test5   2022-03-06  yes
test5   2022-03-12  yes
test5   2022-04-04  yes
test5   2022-04-10  
test5   2022-04-14  
test5   2022-05-04  
test6   2022-03-05  yes
test6   2022-03-15  yes
test6   2022-06-20  
test6   2022-06-24  
test6   2022-06-27

这是我的尝试:

df['old_data'] = np.where(df.groupby('name').cumcount().ge(4), 'yes', '')

使用GroupBy.cumcountascending=False进行计数器降序,而不是大于或等于4使用3

df['old_data'] = np.where(df.groupby('name').cumcount(ascending=False).ge(3), 'yes', '')

GroupBy.rank的另一个想法:

df['date'] = pd.to_datetime(df['date'])

m = df.groupby('name')['date'].rank(method='dense', ascending=False).gt(3)
df['old_data'] = np.where(m, 'yes', '')

print (df)

     name        date old_data
0    test  2022-03-04      yes
1    test  2022-03-05      yes
2    test  2022-03-06      yes
3    test  2022-03-17         
4    test  2022-03-18         
5    test  2022-03-21         
6   test2  2022-03-04      yes
7   test2  2022-03-05      yes
8   test2  2022-03-15      yes
9   test2  2022-03-19         
10  test2  2022-03-21         
11  test2  2022-04-16         
12  test3  2022-03-14      yes
13  test3  2022-03-15         
14  test3  2022-03-23         
15  test3  2022-03-27         
16  test4  2022-03-20         
17  test4  2022-04-15         
18  test4  2022-04-17         
19  test5  2022-03-01      yes
20  test5  2022-03-04      yes
21  test5  2022-03-06      yes
22  test5  2022-03-12      yes
23  test5  2022-04-04      yes
24  test5  2022-04-10         
25  test5  2022-04-14         
26  test5  2022-05-04         
27  test6  2022-03-05      yes
28  test6  2022-03-15      yes
29  test6  2022-06-20         
30  test6  2022-06-24         
31  test6  2022-06-27         

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM