简体   繁体   中英

Find 'next' non-null value in column

I am trying to create a new column which appends the row index of next non-null value next to the current non-null value using the following df as a starting point:

ts = [
        datetime.strptime('2016-06-19 22:01:22.229', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-19 23:32:08.109', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 02:50:22.181', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 06:12:44.249', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 19:27:22.129', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-21 11:39:08.119', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-21 16:59:22.610', '%Y-%m-%d %H:%M:%S.%f')
        ]

score = [ np.nan, np.nan, 77.692, np.nan, 77.709, np.nan, 76.145 ]


df = pd.DataFrame(data={'date': ts, 'jack': score})

The output would look like this:

                     date    jack  nxt_nn_value
0 2016-06-19 22:01:22.229     NaN           NaN
1 2016-06-19 23:32:08.109     NaN           NaN
2 2016-06-20 02:50:22.181  77.692             4
3 2016-06-20 06:12:44.249     NaN           NaN
4 2016-06-20 19:27:22.129  77.709             6
5 2016-06-21 11:39:08.119     NaN           NaN
6 2016-06-21 16:59:22.610  76.145           NaN

I have found a link that touches on the issue but I could not adapt to suit.. How can I get the index of next non-NaN number with series in pandas?

Thank you in advance for any help you can provide.

We can find the notnull values in jack . Then shift up. Then use loc to assign the results, and values to break index alignment:

m = df['jack'].notnull()
df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values

df :

                     date    jack  nxt_nn_value
0 2016-06-19 22:01:22.229     NaN           NaN
1 2016-06-19 23:32:08.109     NaN           NaN
2 2016-06-20 22:50:22.181  77.692           4.0
3 2016-06-20 06:12:44.249     NaN           NaN
4 2016-06-20 19:27:22.129  77.709           6.0
5 2016-06-21 11:39:08.119     NaN           NaN
6 2016-06-21 16:59:22.610  76.145           NaN

Explanation:

First find where the values in jack are notnull :

m = df['jack'].notnull()

0    False
1    False
2     True
3    False
4     True
5    False
6     True
Name: jack, dtype: bool

Filter to get the index there are many equivalent ways to do this:

m[m].index

Int64Index([2, 4, 6], dtype='int64')

Then convert to a Series so it can be shifted this will give the "next" index:

pd.Series(m[m].index).shift(-1)

0    4.0
1    6.0
2    NaN
dtype: float64

Then assign back to the DataFrame only where values are notnull , values is needed to break index alignment between the Series and the DataFrame:

df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM