[英]New Pandas Dataframe Column - latest date for every ID and product
I have a dataframe like this:我有一个这样的数据框:
Index ID Item Date
0 001 A 01/01/19
1 001 B 01/03/19
2 002 A 01/04/19
3 001 A 01/05/19
4 003 B 01/03/19
5 002 A 01/01/19
I would like to create a column that contains the latest date for every ID and Product.我想创建一个列,其中包含每个 ID 和产品的最新日期。 Currently, I am only able to get the latest date of all the dataset or the same date for every row with this code:
目前,我只能使用以下代码获取所有数据集的最新日期或每一行的相同日期:
df['New Column Date'] = df['Date'].values[-1]
But the output should be like this:但是输出应该是这样的:
Index ID Item Date New_column_date
0 001 A 01/01/19 NaN
1 001 B 01/03/19 NaN
2 002 A 01/04/19 NaN
3 001 A 01/05/19 01/01/19
4 003 B 01/03/19 NaN
5 002 A 01/01/19 01/04/2019
Note: when we don´t have an earlier date, zero or NaN value.注意:当我们没有更早的日期时,零或 NaN 值。
Any solutions?任何解决方案?
IIUC, you want groupby.shift
: IIUC,你想要
groupby.shift
:
df['new column']=df.groupby(['ID','Item'])['Date'].shift()
print(df)
ID Item Date new column
0 1 A 01/01/19 NaN
1 1 B 01/03/19 NaN
2 2 A 01/04/19 NaN
3 1 A 01/05/19 01/01/19
4 3 B 01/03/19 NaN
5 2 A 01/01/19 01/04/19
IIUC, we can use groupby
+ transform
+ max
on your date column to get the latest date, IIUC,我们可以在您的日期列上使用
groupby
+ transform
+ max
来获取最新日期,
then filter by duplicates and apply the logic然后按重复项过滤并应用逻辑
only diff is that ID 1
for Item A
shuld be 5th of Jan 2019?唯一的区别是项目
A
ID 1
应该是 2019 年 1 月 5 日吗?
s = df.groupby(['ID','Item'])['Date'].transform('max')
df.loc[df.duplicated(subset=['ID','Item']),'new_date'] = s
print(df)
ID Item Date new_date
0 001 A 2019-01-01 NaT
1 001 B 2019-01-03 NaT
2 002 A 2019-01-04 NaT
3 001 A 2019-01-05 2019-01-05
4 003 B 2019-01-03 NaT
5 002 A 2019-01-01 2019-01-04
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.