[英]How to get most recent date based on a given date using python?
Consider the following two dataframes:考虑以下两个数据框:
Dataframe1 contains a list of users and stop_dates Dataframe1 包含用户列表和 stop_dates
Dataframe2 contains a history of user transactions and dates Dataframe2 包含用户交易和日期的历史记录
I want to get the last transaction date before the stop date for all users in Dataframe1 (some users in Dataframe1 have multiple stop dates)我想获取 Dataframe1 中所有用户的停止日期之前的最后交易日期(Dataframe1 中的某些用户有多个停止日期)
I want the output to look like the following:我希望 output 如下所示:
Here is one way to accomplish (make sure both date columns are already datetime):这是完成的一种方法(确保两个日期列都已经是日期时间):
df = pd.merge(df1, df2, on="UserID")
df["Last_Before_Stop"] = df["Stop_Date"].apply(
lambda x: max(df["Transaction_Date"][df["Transaction_Date"] < x]) if
len(df["Transaction_Date"][df["Transaction_Date"] < x]) != 0 else
pd.nan
)
Please always provide data in a form that makes it easy to use as samples (ie as text, not as images - see here ).请始终以易于用作样本的形式提供数据(即作为文本,而不是作为图像 - 参见此处)。
You could try:你可以试试:
df1["Stop_Date"] = pd.to_datetime(df1["Stop_Date"], format="%m/%d/%y")
df2["Transaction_Date"] = pd.to_datetime(df2["Transaction_Date"], format="%m/%d/%y")
df = (
df1.merge(df2, on="UserID", how="left")
.loc[lambda df: df["Stop_Date"] >= df["Transaction_Date"]]
.groupby(["UserID", "Stop_Date"])["Transaction_Date"].max()
.to_frame().reset_index().drop(columns="Stop_Date")
)
datetime
s out of the date columns.使datetime
脱离日期列。df2
on df1
along UserID
.沿UserID
合并df1
上的df2
。Transaction_Date
greater than Stop_Date
.删除Transaction_Date
大于Stop_Date
的行。UserID
and Stop_Date,
and fetch the maximum Transaction_Date
.按UserID
和Stop_Date,
并获取最大Transaction_Date
。Result for结果为
df1
: df1
:
UserID Stop_Date
0 1 2/2/22
1 2 6/9/22
2 3 7/25/22
3 3 9/14/22
df2
: df2
:
UserID Transaction_Date
0 1 1/2/22
1 1 2/1/22
2 1 2/3/22
3 2 1/24/22
4 2 3/22/22
5 3 6/25/22
6 3 7/20/22
7 3 9/13/22
8 3 9/14/22
9 4 2/2/22
is是
UserID Transaction_Date
0 1 2022-02-01
1 2 2022-03-22
2 3 2022-07-20
3 3 2022-09-14
If you don't want to permanently change the dtype
to datetime
, and also want the result as string, similarly formatted as the input (with padding), then you could try:如果您不想将dtype
永久更改为datetime
,并且还希望结果为字符串,格式与输入类似(带填充),那么您可以尝试:
df = (
df1
.assign(Stop_Date=pd.to_datetime(df1["Stop_Date"], format="%m/%d/%y"))
.merge(
df2.assign(Transaction_Date=pd.to_datetime(df2["Transaction_Date"], format="%m/%d/%y")),
on="UserID", how="left"
)
.loc[lambda df: df["Stop_Date"] >= df["Transaction_Date"]]
.groupby(["UserID", "Stop_Date"])["Transaction_Date"].max()
.to_frame().reset_index().drop(columns="Stop_Date")
.assign(Transaction_Date=lambda df: df["Transaction_Date"].dt.strftime("%m/%d/%y"))
)
Result:结果:
UserID Transaction_Date
0 1 02/01/22
1 2 03/22/22
2 3 07/20/22
3 3 09/14/22
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.