熊猫：用下一个非NaN /＃连续NaN填充NaNs

Question

I'm looking to take a pandas series and fill NaN with the average of the next numerical value where: average = next numerical value / (# consecutive NaNs + 1) 我正在寻找一个熊猫系列，并用下一个数值的平均值填充NaN ，其中： average = next numerical value / (# consecutive NaNs + 1)

Here's my code so far, I just can't figure out how to divide the filler column among the NaN s (and the next numerical value as well) in num : 到目前为止，这是我的代码，我只是无法弄清楚如何在num中将filler列除以NaN （以及下一个数值）：

import pandas as pd

dates = pd.date_range(start = '1/1/2016',end = '1/12/2016', freq = 'D')
nums = [10, 12, None, None, 39, 10, 11, None, None, None, None, 60]

df = pd.DataFrame({
        'date':dates, 
        'num':nums
        })

df['filler'] = df['num'].fillna(method = 'bfill')

Current Output: 电流输出：

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    39.0
3  2016-01-04   NaN    39.0
4  2016-01-05  39.0    39.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    60.0
8  2016-01-09   NaN    60.0
9  2016-01-10   NaN    60.0
10 2016-01-11   NaN    60.0
11 2016-01-12  60.0    60.0

Desired Output: 期望的输出：

         date   num
0  2016-01-01  10.0
1  2016-01-02  12.0
2  2016-01-03  13.0
3  2016-01-04  13.0
4  2016-01-05  13.0
5  2016-01-06  10.0
6  2016-01-07  11.0
7  2016-01-08  12.0
8  2016-01-09  12.0
9  2016-01-10  12.0
10 2016-01-11  12.0
11 2016-01-12  12.0

Answer 1

Take a reverse cumsum of notnull 采取逆向的cumsum notnull
Use that to groupby and transform with mean 用它来groupby并用mean transform

csum = df.num.notnull()[::-1].cumsum()
filler = df.num.fillna(0).groupby(csum).transform('mean')
df.assign(filler=filler)

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    13.0
3  2016-01-04   NaN    13.0
4  2016-01-05  39.0    13.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    12.0
8  2016-01-09   NaN    12.0
9  2016-01-10   NaN    12.0
10 2016-01-11   NaN    12.0
11 2016-01-12  60.0    12.0

how it works 这个怎么运作

df.num.notnull().cumsum() is a standard technique to find groups of contiguous nulls. df.num.notnull().cumsum()是一种查找连续空值组的标准技术。 However, I wanted my groups to end with the next numeric value. 但是，我希望我的组以下一个数值结束。 So I reversed the series and then cumsum 'd. 所以我颠倒了系列，然后是cumsum 。
I want my average to include the number of nulls. 我希望我的平均值包括空值的数量。 Easiest way to do that is to fill with zero and take a normal mean over the groups II just made. 最简单的方法是填充零，并对刚刚制作的组II采取正常的平均值。
transform to broadcast across the existing index transform为现有索引的广播
assign new column. assign新列。 Despite having reversed the series, the index will realign like magic. 尽管已经扭转了这个系列，但该指数将像魔术一样重新调整。 Could have used loc but that overwrites the existing df . 可以使用loc但是会覆盖现有的df 。 I'll let OP decide to overwrite if they want to. 如果他们愿意，我会让OP决定覆盖。

熊猫：用下一个非NaN /＃连续NaN填充NaNs

问题描述

1 个解决方案

解决方案1
11 已采纳 2017-04-19 23:46:11

熊猫：用下一个非NaN /＃连续NaN填充NaNs

问题描述

1 个解决方案

解决方案1 11 已采纳 2017-04-19 23:46:11

解决方案1
11 已采纳 2017-04-19 23:46:11