[英]How do I get the first non-null value from multiple columns based on another datetime column order and grouped by ID?
What I've got right now is a DataFrame like this:我现在得到的是这样的 DataFrame:
id ts site type
0 111 2022-07-25 19:07:00.938365 A NaN
1 111 2022-07-25 19:07:00.938371 NaN 1.0
2 222 2022-07-25 19:07:00.938372 NaN NaN
3 222 2022-07-25 19:07:00.938373 NaN 2.0
4 222 2022-07-25 19:07:00.938374 C 1.0
What I'm trying to do is get the first non-null values of site
and type
for each id
, based on the descending order of ts
.我要做的是根据ts
的降序获取每个id
的site
和type
的第一个非空值。
So my expected output is something like:所以我预期的 output 是这样的:
id site type
0 111 A 1.0
1 222 C 1.0
I've tried to do this:我试过这样做:
df_grouped = df.sort_values(by="ts", ascending=False).groupby("id").ffill().first()
> TypeError: first() missing 1 required positional argument: 'offset'
I've also tried this:我也试过这个:
df_grouped[["site", "type"]].apply(lambda x: x.first_valid_index()).reset_index()
index 0
0 site 0
1 screen_type 0
You can do like this:你可以这样做:
df = df.sort_values('ts', ascending=False)
df.groupby('id', as_index=False)[['site', 'type']].agg(lambda x: x.dropna().iloc[0])
or using first_valid_index
:或使用first_valid_index
:
df.groupby('id', as_index=False)[['site', 'type']].agg(lambda x: x[x.first_valid_index()])
output: output:
id site type
0 111 A 1.0
1 222 C 1.0
Note: If you have all NaNs in either 'site' or 'type' columns it won't work.注意:如果“站点”或“类型”列中的所有 NaN 都将不起作用。 Then you don't even have to do this probably.那么你甚至不必这样做。
(df.sort_values('ts', ascending=False).bfill().groupby('id')[['site', 'type']]
.agg(lambda x:x.bfill().head(1)).reset_index())
id site type
0 111 A 1.0
1 222 C 1.0
Note that if YOU ARE SURE there is ATLEAST 1 NON-NAN per id then you can do:请注意,如果您确定每个 id 至少有 1 个 NON-NAN,那么您可以执行以下操作:
(df.sort_values('ts', ascending=False).bfill().groupby('id')[['site', 'type']]
.first().reset_index())
id site type
0 111 A 1.0
1 222 C 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.