简体   繁体   中英

Returning the last valid (non-null) value from a pandas DataFrame

Suppose I have a dataframe looks like:

      a      b
0    11      A
1    -2      A
2     3      A
3    NA      A
4   0.5      B
5    NA      B
6    -9      B

I can create a group by 'b'. Is there a fast way to get the last non-NA value in 'a' of each group? In this case would be 3 for group A and -9 for group B.

(In this case the series 'a' is sorted as given, but it might not be the case. There could be another column 'c', according which the 'last' is defined.)

I wrote my own loop code by looking into the grouped.groups dict. But apparently that's very inefficient given my huge dataset. I think this could be done very straightforwardly -- maybe I am just too novice with pandas :-)

I added a github issue for this recently: https://github.com/pydata/pandas/issues/1043

In the meantime, you could do:

def get_last_valid(series):
    return series.dropna().iget(-1)

df.groupby('b')['a'].apply(get_last_valid)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM