简体   繁体   中英

Efficient way to select most recent index with finite value in column from Pandas DataFrame?

I'm trying to find the most recent index with a value that is not 'NaN' relative to the current index. So, say I have a DataFrame with 'NaN' values like this:

       A       B       C
0    2.1     5.3     4.7
1    5.1     4.6     NaN
2    5.0     NaN     NaN
3    7.4     NaN     NaN
4    3.5     NaN     NaN
5    5.2     1.0     NaN
6    5.0     6.9     5.4
7    7.4     NaN     NaN
8    3.5     NaN     5.8

If I am currently at index 4, I have the values:

       A       B       C
4    3.5     NaN     NaN

I want to know the last known value of 'B' relative to index 4, which is at index 1 :

       A       B       C
1    5.1   -> 4.6    NaN

I know I can get a list of all indexes with NaN values using something like:

indexes = df.index[df['B'].apply(np.isnan)]

But this seems inefficient in a large database. Is there a way to tail just the last one relative to the current index?

You may try something like this, convert the index to a series that have the same NaN values as column B and then use ffill() which carries the last non missing index forward for all subsequent NaN s:

import pandas as pd
import numpy as np
df['Last_index_notnull'] = df.index.to_series().where(df.B.notnull(), np.nan).ffill()
df['Last_value_notnull'] = df.B.ffill()
df

在此输入图像描述

Now at index 4 , you know the last non missing value is 4.6 and index is 1 .

some useful methods to know

last_valid_index
first_valid_index
for columns B as of index 4

df.B.ix[:4].last_valid_index()

1

you can use this for all columns in this way

pd.concat([df.ix[:i].apply(pd.Series.last_valid_index) for i in df.index],
          axis=1).T

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM