[英]Groupby first by a conditional value
I have a pandas dataframe, like this:我有一个 pandas dataframe,像这样:
ID ![]() |
date![]() |
status![]() |
---|---|---|
10 ![]() |
2022-01-01 ![]() |
0 ![]() |
10 ![]() |
2022-01-02 ![]() |
0 ![]() |
10 ![]() |
2022-01-03 ![]() |
1 ![]() |
10 ![]() |
2022-01-04 ![]() |
1 ![]() |
10 ![]() |
2022-01-05 ![]() |
1 ![]() |
23 ![]() |
2022-02-02 ![]() |
0 ![]() |
23 ![]() |
2022-02-03 ![]() |
0 ![]() |
23 ![]() |
2022-02-04 ![]() |
1 ![]() |
23 ![]() |
2022-02-05 ![]() |
1 ![]() |
23 ![]() |
2022-02-06 ![]() |
1 ![]() |
I would like to group per ID and the first date on status is equal 1.我想按 ID 分组,状态的第一个日期等于 1。
Expected output:预计 output:
ID ![]() |
date![]() |
status![]() |
first_status![]() |
---|---|---|---|
10 ![]() |
2022-01-03 ![]() |
1 ![]() |
2022-01-03 ![]() |
23 ![]() |
2022-02-03 ![]() |
1 ![]() |
2022-02-03 ![]() |
afteer this I will merge this new DF with previous DF.在此之后,我将把这个新的 DF 与以前的 DF 合并。 Final DF:
最终方向:
ID ![]() |
date![]() |
status![]() |
first_status![]() |
---|---|---|---|
10 ![]() |
2022-01-01 ![]() |
0 ![]() |
2022-01-03 ![]() |
10 ![]() |
2022-01-02 ![]() |
0 ![]() |
2022-01-03 ![]() |
10 ![]() |
2022-01-03 ![]() |
1 ![]() |
2022-01-03 ![]() |
10 ![]() |
2022-01-04 ![]() |
1 ![]() |
2022-01-03 ![]() |
10 ![]() |
2022-01-05 ![]() |
1 ![]() |
2022-01-03 ![]() |
23 ![]() |
2022-02-02 ![]() |
0 ![]() |
2022-02-04 ![]() |
23 ![]() |
2022-02-03 ![]() |
0 ![]() |
2022-02-04 ![]() |
23 ![]() |
2022-02-04 ![]() |
1 ![]() |
2022-02-04 ![]() |
23 ![]() |
2022-02-05 ![]() |
1 ![]() |
2022-02-04 ![]() |
23 ![]() |
2022-02-06 ![]() |
1 ![]() |
2022-02-04 ![]() |
I tried many ways to do this, but unsuccessful我尝试了很多方法来做到这一点,但没有成功
Get the first date for status=1 for each ID.为每个 ID 获取 status=1 的第一个日期。 Then map each ID to the first date:
然后map每个ID到第一个日期:
#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"])
df["first_status"] = df["ID"].map(df[df["status"].eq(1)].groupby("ID")["date"].min())
>>> df
ID date status first_status
0 10 2022-01-01 0 2022-01-03
1 10 2022-01-02 0 2022-01-03
2 10 2022-01-03 1 2022-01-03
3 10 2022-01-04 1 2022-01-03
4 10 2022-01-05 1 2022-01-03
5 23 2022-02-02 0 2022-02-04
6 23 2022-02-03 0 2022-02-04
7 23 2022-02-04 1 2022-02-04
8 23 2022-02-05 1 2022-02-04
9 23 2022-02-06 1 2022-02-04
You can filter the status 1 rows, and get the first
(or min
depending on the use case) per group, then merge
to the orginal dataframe:您可以过滤状态 1 行,并获取每个组的
first
(或min
,具体取决于用例),然后merge
到原始 dataframe:
df2 = (df[df['status'].eq(1)]
.groupby('ID', as_index=False)
['date'].first() # could also use "min()"
.rename(columns={'date': 'first_status'})
)
df.merge(df2, on='ID')
output: output:
ID date status first_status
0 10 2022-01-01 0 2022-01-03
1 10 2022-01-02 0 2022-01-03
2 10 2022-01-03 1 2022-01-03
3 10 2022-01-04 1 2022-01-03
4 10 2022-01-05 1 2022-01-03
5 23 2022-02-02 0 2022-02-04
6 23 2022-02-03 0 2022-02-04
7 23 2022-02-04 1 2022-02-04
8 23 2022-02-05 1 2022-02-04
9 23 2022-02-06 1 2022-02-04
intermediate df2
:中间
df2
:
ID first_status
0 10 2022-01-03
1 23 2022-02-04
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.