[英]Within group assign latest column value by date to other rows in pandas
I have a dataframe which looks like this我有一个 dataframe 看起来像这样
pd.DataFrame({'a':['cust1', 'cust1', 'cust1', 'cust2', 'cust2', 'cust3', 'cust3', 'cust3'],
'date':[date(2019, 1, 20), date(2019, 6, 15), date(2020, 6, 12), date(2017, 12, 15), date(2018, 12, 10), date(2017, 1, 5), date(2018, 1, 15), date(2019, 2, 20)],
'ID': ['AA', 'AA', 'bb', 'CC', 'd1', 'GG', 'GG', 'GG'],
'c':[9, 9, 8, 4, 8, 3, 6, 4]})
a date ID c
0 cust1 2019-01-20 AA 9
1 cust1 2019-06-15 AA 9
2 cust1 2020-06-12 bb 8
3 cust2 2017-12-15 CC 4
4 cust2 2018-12-10 d1 8
5 cust3 2017-01-05 GG 3
6 cust3 2018-01-15 GG 6
7 cust3 2019-02-20 GG 4
I want to assign the most recent value (by date) of column 'ID' to all rows within group of 'a'.我想将“ID”列的最新值(按日期)分配给“a”组中的所有行。
My resulting dataframe should look like this:我生成的 dataframe 应该如下所示:
a date c ID
0 cust1 2019-01-20 9 AA
1 cust1 2019-06-15 9 AA
2 cust1 2020-06-12 8 AA
3 cust2 2017-12-15 4 CC
4 cust2 2018-12-10 8 CC
5 cust3 2017-01-05 3 GG
6 cust3 2018-01-15 6 GG
7 cust3 2019-02-20 4 GG
I can do something like this to achieve this but I am wondering if there is a simple 1 line of code.我可以做这样的事情来实现这一点,但我想知道是否有简单的 1 行代码。
new_id = df.sort_values('date').drop_duplicates('a')
df = df.drop(columns='ID')
df_new = df.merge(new_id[['a', 'ID']], how='left', on='a')
You can do transform
你可以做transform
df = df.sort_values('date')
df['new'] = df.groupby('a').ID.transform('last')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.