简体   繁体   English

首先按条件值分组

[英]Groupby first by a conditional value

I have a pandas dataframe, like this:我有一个 pandas dataframe,像这样:

ID ID date日期 status地位
10 10 2022-01-01 2022-01-01 0 0
10 10 2022-01-02 2022-01-02 0 0
10 10 2022-01-03 2022-01-03 1 1个
10 10 2022-01-04 2022-01-04 1 1个
10 10 2022-01-05 2022-01-05 1 1个
23 23 2022-02-02 2022-02-02 0 0
23 23 2022-02-03 2022-02-03 0 0
23 23 2022-02-04 2022-02-04 1 1个
23 23 2022-02-05 2022-02-05 1 1个
23 23 2022-02-06 2022-02-06 1 1个

I would like to group per ID and the first date on status is equal 1.我想按 ID 分组,状态的第一个日期等于 1。

Expected output:预计 output:

ID ID date日期 status地位 first_status第一状态
10 10 2022-01-03 2022-01-03 1 1个 2022-01-03 2022-01-03
23 23 2022-02-03 2022-02-03 1 1个 2022-02-03 2022-02-03

afteer this I will merge this new DF with previous DF.在此之后,我将把这个新的 DF 与以前的 DF 合并。 Final DF:最终方向:

ID ID date日期 status地位 first_status第一状态
10 10 2022-01-01 2022-01-01 0 0 2022-01-03 2022-01-03
10 10 2022-01-02 2022-01-02 0 0 2022-01-03 2022-01-03
10 10 2022-01-03 2022-01-03 1 1个 2022-01-03 2022-01-03
10 10 2022-01-04 2022-01-04 1 1个 2022-01-03 2022-01-03
10 10 2022-01-05 2022-01-05 1 1个 2022-01-03 2022-01-03
23 23 2022-02-02 2022-02-02 0 0 2022-02-04 2022-02-04
23 23 2022-02-03 2022-02-03 0 0 2022-02-04 2022-02-04
23 23 2022-02-04 2022-02-04 1 1个 2022-02-04 2022-02-04
23 23 2022-02-05 2022-02-05 1 1个 2022-02-04 2022-02-04
23 23 2022-02-06 2022-02-06 1 1个 2022-02-04 2022-02-04

I tried many ways to do this, but unsuccessful我尝试了很多方法来做到这一点,但没有成功

Get the first date for status=1 for each ID.为每个 ID 获取 status=1 的第一个日期。 Then map each ID to the first date:然后map每个ID到第一个日期:

#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"])

df["first_status"] = df["ID"].map(df[df["status"].eq(1)].groupby("ID")["date"].min())

>>> df
   ID       date  status first_status
0  10 2022-01-01       0   2022-01-03
1  10 2022-01-02       0   2022-01-03
2  10 2022-01-03       1   2022-01-03
3  10 2022-01-04       1   2022-01-03
4  10 2022-01-05       1   2022-01-03
5  23 2022-02-02       0   2022-02-04
6  23 2022-02-03       0   2022-02-04
7  23 2022-02-04       1   2022-02-04
8  23 2022-02-05       1   2022-02-04
9  23 2022-02-06       1   2022-02-04

You can filter the status 1 rows, and get the first (or min depending on the use case) per group, then merge to the orginal dataframe:您可以过滤状态 1 行,并获取每个组的first (或min ,具体取决于用例),然后merge到原始 dataframe:

df2 = (df[df['status'].eq(1)]
       .groupby('ID', as_index=False)
       ['date'].first() # could also use "min()"
       .rename(columns={'date': 'first_status'})
      )

df.merge(df2, on='ID')

output: output:

   ID        date  status first_status
0  10  2022-01-01       0   2022-01-03
1  10  2022-01-02       0   2022-01-03
2  10  2022-01-03       1   2022-01-03
3  10  2022-01-04       1   2022-01-03
4  10  2022-01-05       1   2022-01-03
5  23  2022-02-02       0   2022-02-04
6  23  2022-02-03       0   2022-02-04
7  23  2022-02-04       1   2022-02-04
8  23  2022-02-05       1   2022-02-04
9  23  2022-02-06       1   2022-02-04

intermediate df2 :中间df2 :

   ID first_status
0  10   2022-01-03
1  23   2022-02-04

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM