Pandas: Fill NaNs with next non-NaN / # consecutive NaNs

Question

I'm looking to take a pandas series and fill NaN with the average of the next numerical value where: average = next numerical value / (# consecutive NaNs + 1)

Here's my code so far, I just can't figure out how to divide the filler column among the NaN s (and the next numerical value as well) in num :

import pandas as pd

dates = pd.date_range(start = '1/1/2016',end = '1/12/2016', freq = 'D')
nums = [10, 12, None, None, 39, 10, 11, None, None, None, None, 60]

df = pd.DataFrame({
        'date':dates, 
        'num':nums
        })

df['filler'] = df['num'].fillna(method = 'bfill')

Current Output:

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    39.0
3  2016-01-04   NaN    39.0
4  2016-01-05  39.0    39.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    60.0
8  2016-01-09   NaN    60.0
9  2016-01-10   NaN    60.0
10 2016-01-11   NaN    60.0
11 2016-01-12  60.0    60.0

Desired Output:

         date   num
0  2016-01-01  10.0
1  2016-01-02  12.0
2  2016-01-03  13.0
3  2016-01-04  13.0
4  2016-01-05  13.0
5  2016-01-06  10.0
6  2016-01-07  11.0
7  2016-01-08  12.0
8  2016-01-09  12.0
9  2016-01-10  12.0
10 2016-01-11  12.0
11 2016-01-12  12.0

Answer 1

Take a reverse cumsum of notnull
Use that to groupby and transform with mean

csum = df.num.notnull()[::-1].cumsum()
filler = df.num.fillna(0).groupby(csum).transform('mean')
df.assign(filler=filler)

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    13.0
3  2016-01-04   NaN    13.0
4  2016-01-05  39.0    13.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    12.0
8  2016-01-09   NaN    12.0
9  2016-01-10   NaN    12.0
10 2016-01-11   NaN    12.0
11 2016-01-12  60.0    12.0

how it works

df.num.notnull().cumsum() is a standard technique to find groups of contiguous nulls. However, I wanted my groups to end with the next numeric value. So I reversed the series and then cumsum 'd.
I want my average to include the number of nulls. Easiest way to do that is to fill with zero and take a normal mean over the groups II just made.
transform to broadcast across the existing index
assign new column. Despite having reversed the series, the index will realign like magic. Could have used loc but that overwrites the existing df . I'll let OP decide to overwrite if they want to.

Pandas: Fill NaNs with next non-NaN / # consecutive NaNs

Question

1 answers

solution1
11 ACCPTED 2017-04-19 23:46:11

Pandas: Fill NaNs with next non-NaN / # consecutive NaNs

Question

1 answers

solution1 11 ACCPTED 2017-04-19 23:46:11

solution1
11 ACCPTED 2017-04-19 23:46:11