I have recently come across a case where I would like to replace NaN
values with the rolling mean of the previous non-NaN values in such a way that each newly generated rolling mean is then considered a non-NaN and is used for the next NaN. This is the sample data set:
df = pd.DataFrame({'col1': [1, 3, 4, 5, 6, np.NaN, np.NaN, np.NaN]})
df
col1
0 1.0
1 3.0
2 4.0
3 5.0
4 6.0
5 NaN # (6.0 + 5.0) / 2
6 NaN # (5.5 + 6.0) / 2
7 NaN # ...
I have also found a solution for this which I am struggling to understand:
from functools import reduce
reduce(lambda x, _: x.fillna(x.rolling(2, min_periods=2).mean().shift()), range(df['col1'].isna().sum()), df)
My problem with this solution is reduce
function takes 3 arguments, where we first define the lambda function then we specify the iterator. In the solution above I don't understand the last df
we put in the function call for reduce and I struggle to understand how it works in general to populate the NaN
.
I would appreciate any explanation of how it works. Also if there is any pandas
, numpy
based solution as reduce
is not seemingly efficient here.
for i in df.index:
if np.isnan(df["col1"][i]):
df["col1"][i] = (df["col1"][i - 1] + df["col1"][i - 2]) / 2
This can be a start using for loop, it will fail if the first 2 values of the dataframe are NAN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.