首先按条件值分组

Question

I have a pandas dataframe, like this:我有一个 pandas dataframe，像这样：

ID ID	date日期	status地位
10 10	2022-01-01 2022-01-01	0 0
10 10	2022-01-02 2022-01-02	0 0
10 10	2022-01-03 2022-01-03	1 1个
10 10	2022-01-04 2022-01-04	1 1个
10 10	2022-01-05 2022-01-05	1 1个
23 23	2022-02-02 2022-02-02	0 0
23 23	2022-02-03 2022-02-03	0 0
23 23	2022-02-04 2022-02-04	1 1个
23 23	2022-02-05 2022-02-05	1 1个
23 23	2022-02-06 2022-02-06	1 1个

I would like to group per ID and the first date on status is equal 1.我想按 ID 分组，状态的第一个日期等于 1。

Expected output:预计 output：

ID ID	date日期	status地位	first_status第一状态
10 10	2022-01-03 2022-01-03	1 1个	2022-01-03 2022-01-03
23 23	2022-02-03 2022-02-03	1 1个	2022-02-03 2022-02-03

afteer this I will merge this new DF with previous DF.在此之后，我将把这个新的 DF 与以前的 DF 合并。 Final DF:最终方向：

ID ID	date日期	status地位	first_status第一状态
10 10	2022-01-01 2022-01-01	0 0	2022-01-03 2022-01-03
10 10	2022-01-02 2022-01-02	0 0	2022-01-03 2022-01-03
10 10	2022-01-03 2022-01-03	1 1个	2022-01-03 2022-01-03
10 10	2022-01-04 2022-01-04	1 1个	2022-01-03 2022-01-03
10 10	2022-01-05 2022-01-05	1 1个	2022-01-03 2022-01-03
23 23	2022-02-02 2022-02-02	0 0	2022-02-04 2022-02-04
23 23	2022-02-03 2022-02-03	0 0	2022-02-04 2022-02-04
23 23	2022-02-04 2022-02-04	1 1个	2022-02-04 2022-02-04
23 23	2022-02-05 2022-02-05	1 1个	2022-02-04 2022-02-04
23 23	2022-02-06 2022-02-06	1 1个	2022-02-04 2022-02-04

I tried many ways to do this, but unsuccessful我尝试了很多方法来做到这一点，但没有成功

Answer 1

Get the first date for status=1 for each ID.为每个 ID 获取 status=1 的第一个日期。 Then map each ID to the first date:然后map每个ID到第一个日期：

#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"])

df["first_status"] = df["ID"].map(df[df["status"].eq(1)].groupby("ID")["date"].min())

>>> df
   ID       date  status first_status
0  10 2022-01-01       0   2022-01-03
1  10 2022-01-02       0   2022-01-03
2  10 2022-01-03       1   2022-01-03
3  10 2022-01-04       1   2022-01-03
4  10 2022-01-05       1   2022-01-03
5  23 2022-02-02       0   2022-02-04
6  23 2022-02-03       0   2022-02-04
7  23 2022-02-04       1   2022-02-04
8  23 2022-02-05       1   2022-02-04
9  23 2022-02-06       1   2022-02-04

Answer 2

You can filter the status 1 rows, and get the first (or min depending on the use case) per group, then merge to the orginal dataframe:您可以过滤状态 1 行，并获取每个组的first （或min ，具体取决于用例），然后merge到原始 dataframe：

df2 = (df[df['status'].eq(1)]
       .groupby('ID', as_index=False)
       ['date'].first() # could also use "min()"
       .rename(columns={'date': 'first_status'})
      )

df.merge(df2, on='ID')

output: output：

   ID        date  status first_status
0  10  2022-01-01       0   2022-01-03
1  10  2022-01-02       0   2022-01-03
2  10  2022-01-03       1   2022-01-03
3  10  2022-01-04       1   2022-01-03
4  10  2022-01-05       1   2022-01-03
5  23  2022-02-02       0   2022-02-04
6  23  2022-02-03       0   2022-02-04
7  23  2022-02-04       1   2022-02-04
8  23  2022-02-05       1   2022-02-04
9  23  2022-02-06       1   2022-02-04

intermediate df2 :中间df2 :

   ID first_status
0  10   2022-01-03
1  23   2022-02-04

首先按条件值分组

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-04-28 13:36:48

解决方案2
1 2022-04-28 13:36:51

首先按条件值分组

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-04-28 13:36:48

解决方案2 1 2022-04-28 13:36:51

解决方案1
1 已采纳 2022-04-28 13:36:48

解决方案2
1 2022-04-28 13:36:51