Returning the last valid (non-null) value from a pandas DataFrame

Question

Suppose I have a dataframe looks like:

      a      b
0    11      A
1    -2      A
2     3      A
3    NA      A
4   0.5      B
5    NA      B
6    -9      B

I can create a group by 'b'. Is there a fast way to get the last non-NA value in 'a' of each group? In this case would be 3 for group A and -9 for group B.

(In this case the series 'a' is sorted as given, but it might not be the case. There could be another column 'c', according which the 'last' is defined.)

I wrote my own loop code by looking into the grouped.groups dict. But apparently that's very inefficient given my huge dataset. I think this could be done very straightforwardly -- maybe I am just too novice with pandas :-)

Answer 1

I added a github issue for this recently: https://github.com/pydata/pandas/issues/1043

In the meantime, you could do:

def get_last_valid(series):
    return series.dropna().iget(-1)

df.groupby('b')['a'].apply(get_last_valid)

Returning the last valid (non-null) value from a pandas DataFrame

Question

1 answers

solution1
4 2012-04-18 01:07:15

Returning the last valid (non-null) value from a pandas DataFrame

Question

1 answers

solution1 4 2012-04-18 01:07:15

solution1
4 2012-04-18 01:07:15