从pandas DataFrame返回最后一个有效（非null）值

Question

Suppose I have a dataframe looks like: 假设我有一个dataframe如下：

      a      b
0    11      A
1    -2      A
2     3      A
3    NA      A
4   0.5      B
5    NA      B
6    -9      B

I can create a group by 'b'. 我可以用'b'创建一个组。 Is there a fast way to get the last non-NA value in 'a' of each group? 有没有一种快速的方法来获得每组中'a'的最后一个非NA值？ In this case would be 3 for group A and -9 for group B. 在这种情况下，对于A组为3，对于B组为-9。

(In this case the series 'a' is sorted as given, but it might not be the case. There could be another column 'c', according which the 'last' is defined.) （在这种情况下，系列'a'按给定的顺序排序，但情况可能并非如此。可能有另一列'c'，根据该列定义'last'。）

I wrote my own loop code by looking into the grouped.groups dict. 我通过查看groups.groups dict编写了自己的循环代码。 But apparently that's very inefficient given my huge dataset. 但显然，鉴于我庞大的数据集，效率非常低。 I think this could be done very straightforwardly -- maybe I am just too novice with pandas :-) 我认为这可以非常直接地完成 - 也许我对熊猫太新手了:-)

Answer 1

I added a github issue for this recently: https://github.com/pydata/pandas/issues/1043 我最近为此添加了一个github问题： https ： //github.com/pydata/pandas/issues/1043

In the meantime, you could do: 在此期间，你可以这样做：

def get_last_valid(series):
    return series.dropna().iget(-1)

df.groupby('b')['a'].apply(get_last_valid)

从pandas DataFrame返回最后一个有效（非null）值

问题描述

1 个解决方案

解决方案1
4 2012-04-18 01:07:15

从pandas DataFrame返回最后一个有效（非null）值

问题描述

1 个解决方案

解决方案1 4 2012-04-18 01:07:15

解决方案1
4 2012-04-18 01:07:15