简体   繁体   English

如何使用 python 根据给定日期获取最新日期?

[英]How to get most recent date based on a given date using python?

Consider the following two dataframes:考虑以下两个数据框:

Dataframe1 contains a list of users and stop_dates Dataframe1 包含用户列表和 stop_dates

在此处输入图像描述

Dataframe2 contains a history of user transactions and dates Dataframe2 包含用户交易和日期的历史记录

在此处输入图像描述

I want to get the last transaction date before the stop date for all users in Dataframe1 (some users in Dataframe1 have multiple stop dates)我想获取 Dataframe1 中所有用户的停止日期之前的最后交易日期(Dataframe1 中的某些用户有多个停止日期)

I want the output to look like the following:我希望 output 如下所示:

在此处输入图像描述

Here is one way to accomplish (make sure both date columns are already datetime):这是完成的一种方法(确保两个日期列都已经是日期时间):

df = pd.merge(df1, df2, on="UserID")

df["Last_Before_Stop"] = df["Stop_Date"].apply(
    lambda x: max(df["Transaction_Date"][df["Transaction_Date"] < x]) if
    len(df["Transaction_Date"][df["Transaction_Date"] < x]) != 0 else
    pd.nan
)

Please always provide data in a form that makes it easy to use as samples (ie as text, not as images - see here ).请始终以易于用作样本的形式提供数据(即作为文本,而不是作为图像 - 参见此处)。

You could try:你可以试试:

df1["Stop_Date"] = pd.to_datetime(df1["Stop_Date"], format="%m/%d/%y")
df2["Transaction_Date"] = pd.to_datetime(df2["Transaction_Date"], format="%m/%d/%y")
df = (
    df1.merge(df2, on="UserID", how="left")
    .loc[lambda df: df["Stop_Date"] >= df["Transaction_Date"]]
    .groupby(["UserID", "Stop_Date"])["Transaction_Date"].max()
    .to_frame().reset_index().drop(columns="Stop_Date")
)
  • Make datetime s out of the date columns.使datetime脱离日期列。
  • Merge df2 on df1 along UserID .沿UserID合并df1上的df2
  • Remove the rows which have a Transaction_Date greater than Stop_Date .删除Transaction_Date大于Stop_Date的行。
  • Group the result by UserID and Stop_Date, and fetch the maximum Transaction_Date .UserIDStop_Date,并获取最大Transaction_Date
  • Bring the result in shape.使结果成形。

Result for结果为

df1 : df1

   UserID Stop_Date
0       1    2/2/22
1       2    6/9/22
2       3   7/25/22
3       3   9/14/22

df2 : df2

   UserID Transaction_Date
0       1           1/2/22
1       1           2/1/22
2       1           2/3/22
3       2          1/24/22
4       2          3/22/22
5       3          6/25/22
6       3          7/20/22
7       3          9/13/22
8       3          9/14/22
9       4           2/2/22

is

   UserID Transaction_Date
0       1       2022-02-01
1       2       2022-03-22
2       3       2022-07-20
3       3       2022-09-14

If you don't want to permanently change the dtype to datetime , and also want the result as string, similarly formatted as the input (with padding), then you could try:如果您不想将dtype永久更改为datetime ,并且还希望结果为字符串,格式与输入类似(带填充),那么您可以尝试:

df = (
    df1
    .assign(Stop_Date=pd.to_datetime(df1["Stop_Date"], format="%m/%d/%y"))
    .merge(
        df2.assign(Transaction_Date=pd.to_datetime(df2["Transaction_Date"], format="%m/%d/%y")),
        on="UserID", how="left"
    )
    .loc[lambda df: df["Stop_Date"] >= df["Transaction_Date"]]
    .groupby(["UserID", "Stop_Date"])["Transaction_Date"].max()
    .to_frame().reset_index().drop(columns="Stop_Date")
    .assign(Transaction_Date=lambda df: df["Transaction_Date"].dt.strftime("%m/%d/%y"))
)

Result:结果:

   UserID Transaction_Date
0       1         02/01/22
1       2         03/22/22
2       3         07/20/22
3       3         09/14/22

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM