简体   繁体   English

如何在 pandas 中用滚动平均值填充 nan 值

[英]How to fill nan values with rolling mean in pandas

I have a dataframe which contains nan values at few places.我有一个 dataframe,它在几个地方包含 nan 值。 I am trying to perform data cleaning in which I fill the nan values with mean of it's previous five instances.我正在尝试执行数据清理,其中我用前五个实例的平均值填充 nan 值。 To do so, I have come up with the following.为此,我提出了以下建议。

input_data_frame[var_list].fillna(input_data_frame[var_list].rolling(5).mean(), inplace=True)

But, this is not working.但是,这是行不通的。 It isn't filling the nan values.它没有填充 nan 值。 There is no change in the dataframe's null count before and after the above operation.上述操作前后dataframe的null计数没有变化。 Assuming I have a dataframe with just integer column, How can I fill NaN values with mean of the previous five instances?假设我有一个只有 integer 列的 dataframe,我如何用前五个实例的平均值填充 NaN 值? Thanks in advance.提前致谢。

This should work:这应该有效:

input_data_frame[var_list]= input_data_frame[var_list].fillna(pd.rolling_mean(input_data_frame[var_list], 6, min_periods=1))

Note that the window is 6 because it includes the value of NaN itself (which is not counted in the average).请注意, window6因为它包括NaN本身的值(不计入平均值)。 Also the other NaN values are not used for the averages, so if less that 5 values are found in the window, the average is calculated on the actual values.此外,其他NaN值不用于平均值,因此如果在窗口中找到的值少于 5 个,则根据实际值计算平均值。

Example:例子:

df = {'a': [1, 1,2,3,4,5, np.nan, 1, 1, 2, 3, 4, 5, np.nan] }
df = pd.DataFrame(data=df)
print df

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   NaN
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  NaN

Output:输出:

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   3.0
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  3.0

rolling_mean function has been modified in pandas. rolling_mean修改了rolling_mean函数。 If you fill the entire dataset, you can use;如果填充整个数据集,则可以使用;

filled_dataset = dataset.fillna(dataset.rolling(6,min_periods=1).mean())

you can simply use interpolate()你可以简单地使用 interpolate()

df = {'a': [1,5, np.nan, np.nan, np.nan, 2, 5, np.nan] }
df = pd.DataFrame(data=df)
print(df)


df['a'].interpolate()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM