简体   繁体   中英

Pandas: Fill NaNs with next non-NaN / # consecutive NaNs

I'm looking to take a pandas series and fill NaN with the average of the next numerical value where: average = next numerical value / (# consecutive NaNs + 1)

Here's my code so far, I just can't figure out how to divide the filler column among the NaN s (and the next numerical value as well) in num :

import pandas as pd

dates = pd.date_range(start = '1/1/2016',end = '1/12/2016', freq = 'D')
nums = [10, 12, None, None, 39, 10, 11, None, None, None, None, 60]

df = pd.DataFrame({
        'date':dates, 
        'num':nums
        })

df['filler'] = df['num'].fillna(method = 'bfill')

Current Output:

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    39.0
3  2016-01-04   NaN    39.0
4  2016-01-05  39.0    39.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    60.0
8  2016-01-09   NaN    60.0
9  2016-01-10   NaN    60.0
10 2016-01-11   NaN    60.0
11 2016-01-12  60.0    60.0

Desired Output:

         date   num
0  2016-01-01  10.0
1  2016-01-02  12.0
2  2016-01-03  13.0
3  2016-01-04  13.0
4  2016-01-05  13.0
5  2016-01-06  10.0
6  2016-01-07  11.0
7  2016-01-08  12.0
8  2016-01-09  12.0
9  2016-01-10  12.0
10 2016-01-11  12.0
11 2016-01-12  12.0
  • Take a reverse cumsum of notnull
  • Use that to groupby and transform with mean

csum = df.num.notnull()[::-1].cumsum()
filler = df.num.fillna(0).groupby(csum).transform('mean')
df.assign(filler=filler)

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    13.0
3  2016-01-04   NaN    13.0
4  2016-01-05  39.0    13.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    12.0
8  2016-01-09   NaN    12.0
9  2016-01-10   NaN    12.0
10 2016-01-11   NaN    12.0
11 2016-01-12  60.0    12.0

how it works

  • df.num.notnull().cumsum() is a standard technique to find groups of contiguous nulls. However, I wanted my groups to end with the next numeric value. So I reversed the series and then cumsum 'd.
  • I want my average to include the number of nulls. Easiest way to do that is to fill with zero and take a normal mean over the groups II just made.
  • transform to broadcast across the existing index
  • assign new column. Despite having reversed the series, the index will realign like magic. Could have used loc but that overwrites the existing df . I'll let OP decide to overwrite if they want to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM