新的 Pandas 数据框列 - 每个 ID 和产品的最新日期

Question

I have a dataframe like this:我有一个这样的数据框：

Index   ID       Item       Date
0       001      A          01/01/19
1       001      B          01/03/19
2       002      A          01/04/19
3       001      A          01/05/19
4       003      B          01/03/19
5       002      A          01/01/19

I would like to create a column that contains the latest date for every ID and Product.我想创建一个列，其中包含每个 ID 和产品的最新日期。 Currently, I am only able to get the latest date of all the dataset or the same date for every row with this code:目前，我只能使用以下代码获取所有数据集的最新日期或每一行的相同日期：

df['New Column Date'] = df['Date'].values[-1]

But the output should be like this:但是输出应该是这样的：

Index   ID       Item      Date      New_column_date
0       001      A         01/01/19  NaN
1       001      B         01/03/19  NaN
2       002      A         01/04/19  NaN
3       001      A         01/05/19  01/01/19
4       003      B         01/03/19  NaN
5       002      A         01/01/19  01/04/2019

Note: when we don´t have an earlier date, zero or NaN value.注意：当我们没有更早的日期时，零或 NaN 值。

Any solutions?任何解决方案？

Answer 1

IIUC, you want groupby.shift : IIUC，你想要groupby.shift ：

df['new column']=df.groupby(['ID','Item'])['Date'].shift()
print(df)

   ID Item      Date new column
0   1    A  01/01/19        NaN
1   1    B  01/03/19        NaN
2   2    A  01/04/19        NaN
3   1    A  01/05/19   01/01/19
4   3    B  01/03/19        NaN
5   2    A  01/01/19   01/04/19

Answer 2

IIUC, we can use groupby + transform + max on your date column to get the latest date, IIUC，我们可以在您的日期列上使用groupby + transform + max来获取最新日期，

then filter by duplicates and apply the logic然后按重复项过滤并应用逻辑

only diff is that ID 1 for Item A shuld be 5th of Jan 2019?唯一的区别是项目A ID 1应该是 2019 年 1 月 5 日吗？

s = df.groupby(['ID','Item'])['Date'].transform('max')
df.loc[df.duplicated(subset=['ID','Item']),'new_date'] = s
print(df)


   ID    Item       Date   new_date
0  001    A 2019-01-01        NaT
1  001    B 2019-01-03        NaT
2  002    A 2019-01-04        NaT
3  001    A 2019-01-05 2019-01-05
4  003    B 2019-01-03        NaT
5  002    A 2019-01-01 2019-01-04

新的 Pandas 数据框列 - 每个 ID 和产品的最新日期

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-12-31 15:29:47

解决方案2
1 2019-12-31 15:18:29

新的 Pandas 数据框列 - 每个 ID 和产品的最新日期

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-12-31 15:29:47

解决方案2 1 2019-12-31 15:18:29

解决方案1
3 已采纳 2019-12-31 15:29:47

解决方案2
1 2019-12-31 15:18:29