如何根据另一个日期时间列顺序并按 ID 分组从多个列中获取第一个非空值？

Question

What I've got right now is a DataFrame like this:我现在得到的是这样的 DataFrame：

    id  ts                          site   type
0   111 2022-07-25 19:07:00.938365  A      NaN
1   111 2022-07-25 19:07:00.938371  NaN    1.0
2   222 2022-07-25 19:07:00.938372  NaN    NaN
3   222 2022-07-25 19:07:00.938373  NaN    2.0
4   222 2022-07-25 19:07:00.938374  C      1.0

What I'm trying to do is get the first non-null values of site and type for each id , based on the descending order of ts .我要做的是根据ts的降序获取每个id的site和type的第一个非空值。

So my expected output is something like:所以我预期的 output 是这样的：

    id  site   type
0   111 A      1.0
1   222 C      1.0

I've tried to do this:我试过这样做：

df_grouped = df.sort_values(by="ts", ascending=False).groupby("id").ffill().first()


> TypeError: first() missing 1 required positional argument: 'offset'

I've also tried this:我也试过这个：

df_grouped[["site", "type"]].apply(lambda x: x.first_valid_index()).reset_index()



    index       0
0   site        0
1   screen_type 0

Answer 1

You can do like this:你可以这样做：

df = df.sort_values('ts', ascending=False)

df.groupby('id', as_index=False)[['site', 'type']].agg(lambda x: x.dropna().iloc[0])

or using first_valid_index :或使用first_valid_index ：

df.groupby('id', as_index=False)[['site', 'type']].agg(lambda x: x[x.first_valid_index()])

output: output：

    id site  type
0  111    A   1.0
1  222    C   1.0

Note: If you have all NaNs in either 'site' or 'type' columns it won't work.注意：如果“站点”或“类型”列中的所有 NaN 都将不起作用。 Then you don't even have to do this probably.那么你甚至不必这样做。

Answer 2

(df.sort_values('ts', ascending=False).bfill().groupby('id')[['site', 'type']]
   .agg(lambda x:x.bfill().head(1)).reset_index())

    id site  type
0  111    A   1.0
1  222    C   1.0

Note that if YOU ARE SURE there is ATLEAST 1 NON-NAN per id then you can do:请注意，如果您确定每个 id 至少有 1 个 NON-NAN，那么您可以执行以下操作：

(df.sort_values('ts', ascending=False).bfill().groupby('id')[['site', 'type']]
   .first().reset_index())

    id site  type
0  111    A   1.0
1  222    C   1.0

如何根据另一个日期时间列顺序并按 ID 分组从多个列中获取第一个非空值？

问题描述

2 个解决方案

解决方案1
0 2022-07-25 22:36:30

解决方案2
0 2022-07-25 23:03:28

如何根据另一个日期时间列顺序并按 ID 分组从多个列中获取第一个非空值？

问题描述

2 个解决方案

解决方案1 0 2022-07-25 22:36:30

解决方案2 0 2022-07-25 23:03:28

解决方案1
0 2022-07-25 22:36:30

解决方案2
0 2022-07-25 23:03:28