简体   繁体   中英

Filling NaN values with rolling mean of the previous non-NaN values

I have recently come across a case where I would like to replace NaN values with the rolling mean of the previous non-NaN values in such a way that each newly generated rolling mean is then considered a non-NaN and is used for the next NaN. This is the sample data set:

df = pd.DataFrame({'col1': [1, 3, 4, 5, 6, np.NaN, np.NaN, np.NaN]})
df

   col1
0   1.0
1   3.0
2   4.0
3   5.0
4   6.0
5   NaN  # (6.0 + 5.0) / 2
6   NaN  # (5.5 + 6.0) / 2
7   NaN  # ...

I have also found a solution for this which I am struggling to understand:

from functools import reduce

reduce(lambda x, _: x.fillna(x.rolling(2, min_periods=2).mean().shift()), range(df['col1'].isna().sum()), df)

My problem with this solution is reduce function takes 3 arguments, where we first define the lambda function then we specify the iterator. In the solution above I don't understand the last df we put in the function call for reduce and I struggle to understand how it works in general to populate the NaN .

I would appreciate any explanation of how it works. Also if there is any pandas , numpy based solution as reduce is not seemingly efficient here.

for i in df.index:
    if np.isnan(df["col1"][i]):
        df["col1"][i] = (df["col1"][i - 1] + df["col1"][i - 2]) / 2

This can be a start using for loop, it will fail if the first 2 values of the dataframe are NAN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM