简体   繁体   English

在组内按日期将最新列值分配给 pandas 中的其他行

[英]Within group assign latest column value by date to other rows in pandas

I have a dataframe which looks like this我有一个 dataframe 看起来像这样

pd.DataFrame({'a':['cust1', 'cust1', 'cust1',  'cust2', 'cust2', 'cust3', 'cust3', 'cust3'],
                   'date':[date(2019, 1, 20), date(2019, 6, 15), date(2020, 6, 12), date(2017, 12, 15), date(2018, 12, 10), date(2017, 1, 5), date(2018, 1, 15), date(2019, 2, 20)],
                   'ID': ['AA', 'AA', 'bb', 'CC', 'd1', 'GG', 'GG', 'GG'],
                   'c':[9, 9, 8, 4, 8, 3, 6, 4]})

       a        date  ID  c
0  cust1  2019-01-20  AA  9
1  cust1  2019-06-15  AA  9
2  cust1  2020-06-12  bb  8
3  cust2  2017-12-15  CC  4
4  cust2  2018-12-10  d1  8
5  cust3  2017-01-05  GG  3
6  cust3  2018-01-15  GG  6
7  cust3  2019-02-20  GG  4

I want to assign the most recent value (by date) of column 'ID' to all rows within group of 'a'.我想将“ID”列的最新值(按日期)分配给“a”组中的所有行。

My resulting dataframe should look like this:我生成的 dataframe 应该如下所示:

       a        date  c  ID
0  cust1  2019-01-20  9  AA
1  cust1  2019-06-15  9  AA
2  cust1  2020-06-12  8  AA
3  cust2  2017-12-15  4  CC
4  cust2  2018-12-10  8  CC
5  cust3  2017-01-05  3  GG
6  cust3  2018-01-15  6  GG
7  cust3  2019-02-20  4  GG

I can do something like this to achieve this but I am wondering if there is a simple 1 line of code.我可以做这样的事情来实现这一点,但我想知道是否有简单的 1 行代码。

new_id = df.sort_values('date').drop_duplicates('a')
df = df.drop(columns='ID')
df_new = df.merge(new_id[['a', 'ID']], how='left', on='a')

You can do transform你可以做transform

df = df.sort_values('date')
df['new'] = df.groupby('a').ID.transform('last')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM