简体   繁体   English

新的 Pandas 数据框列 - 每个 ID 和产品的最新日期

[英]New Pandas Dataframe Column - latest date for every ID and product

I have a dataframe like this:我有一个这样的数据框:

Index   ID       Item       Date
0       001      A          01/01/19
1       001      B          01/03/19
2       002      A          01/04/19
3       001      A          01/05/19
4       003      B          01/03/19
5       002      A          01/01/19

I would like to create a column that contains the latest date for every ID and Product.我想创建一个列,其中包含每个 ID 和产品的最新日期。 Currently, I am only able to get the latest date of all the dataset or the same date for every row with this code:目前,我只能使用以下代码获取所有数据集的最新日期或每一行的相同日期:

df['New Column Date'] = df['Date'].values[-1]

But the output should be like this:但是输出应该是这样的:

Index   ID       Item      Date      New_column_date
0       001      A         01/01/19  NaN
1       001      B         01/03/19  NaN
2       002      A         01/04/19  NaN
3       001      A         01/05/19  01/01/19
4       003      B         01/03/19  NaN
5       002      A         01/01/19  01/04/2019  

Note: when we don´t have an earlier date, zero or NaN value.注意:当我们没有更早的日期时,零或 NaN 值。

Any solutions?任何解决方案?

IIUC, you want groupby.shift : IIUC,你想要groupby.shift

df['new column']=df.groupby(['ID','Item'])['Date'].shift()
print(df)

   ID Item      Date new column
0   1    A  01/01/19        NaN
1   1    B  01/03/19        NaN
2   2    A  01/04/19        NaN
3   1    A  01/05/19   01/01/19
4   3    B  01/03/19        NaN
5   2    A  01/01/19   01/04/19

IIUC, we can use groupby + transform + max on your date column to get the latest date, IIUC,我们可以在您的日期列上使用groupby + transform + max来获取最新日期,

then filter by duplicates and apply the logic然后按重复项过滤并应用逻辑

only diff is that ID 1 for Item A shuld be 5th of Jan 2019?唯一的区别是项目A ID 1应该是 2019 年 1 月 5 日吗?

s = df.groupby(['ID','Item'])['Date'].transform('max')
df.loc[df.duplicated(subset=['ID','Item']),'new_date'] = s
print(df)


   ID    Item       Date   new_date
0  001    A 2019-01-01        NaT
1  001    B 2019-01-03        NaT
2  002    A 2019-01-04        NaT
3  001    A 2019-01-05 2019-01-05
4  003    B 2019-01-03        NaT
5  002    A 2019-01-01 2019-01-04

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据最新的列创建一个新列,并在数据框上有一个值 - Pandas - Create a new Column based on the latest column with a value on a dataframe - Pandas 在 pandas DataFrame 中获取至少 6 个月前的最新日期的 id 的值 - Get the value for a id with the latest date at least 6 months ago in a pandas DataFrame 获取Pandas DataFrame每个元素的最新信息,以及范围索引和日期列? - Get the latest of each element of a Pandas DataFrame, with range indexing and a date column? 每个 pandas 行与另一个 pandas dataframe 作为新列的相关性 - Correlation of every pandas row with another pandas dataframe as a new column 联合两个 Spark 数据框并添加新列以标识最新日期 - Union two Spark dataframe and add new column to identify latest date Python Pandas groupby,带有不同值的日期列,然后返回一个dataframe,日期列填充最新日期 - Python Pandas groupby, with a date column with different values, then returns a dataframe with the date column filled with the latest date 如果ID存在于其他数据帧中,则Python Pandas数据帧会在新列中添加“1” - Python Pandas dataframe add “1” in new column if ID exists in other dataframe 一次计算 Pandas 数据框中每个日期列之间的天数差异 - Calculate days difference between every date column in Pandas dataframe at once 在 pandas dataframe 中为另一个日期框列中的每个日期添加一行 - Add a row in pandas dataframe for every date in another dateframe column pandas dataframe 每2列的百分比差异并生成一个新列 - Percentage difference every 2 columns of pandas dataframe and generate a new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM