[英]How to ignore NaN values for a rolling mean calculation in pandas DataFrame?
I try to create a DataFrame
containing a rolling mean based on a window with length 5. But my data contains one NaN
value and therefore I only get NaN
values for column 3 with a NaN
values.我尝试创建一个DataFrame
,其中包含基于长度为 5 的 window 的滚动平均值。但我的数据包含一个NaN
值,因此我只得到第 3 列的NaN
值和一个NaN
值。 How is it possible to ignore NaN
values when using .rolling(5).mean()
?使用.rolling(5).mean()
时如何忽略NaN
值?
I have this sample data df1
:我有这个样本数据df1
:
Column1 Column2 Column3 Column4
0 1 5 -9.0 13
1 1 6 -10.0 15
2 3 7 -5.0 11
3 4 8 NaN 9
4 6 5 -2.0 8
5 2 8 0.0 10
6 3 8 -3.0 12
For convenience:为了方便:
#create DataFrame with NaN
df1 = pd.DataFrame({
'Column1':[1, 1, 3, 4, 6, 2, 3],
'Column2':[5, 6, 7, 8, 5, 8, 8],
'Column3':[-9, -10, -5, 'NaN', -2, 0, -3],
'Column4':[13, 15, 11, 9, 8, 10, 12]
})
df1 = df1.replace('NaN',np.nan)
df1
When I use to create a rolling mean based on a window of 5, I get for column 3 only NaN
values.当我使用基于 5 的 window 创建滚动平均值时,我只得到第 3 列的NaN
值。
df2 = df1.rolling(5).mean()
Column1 Column2 Column3 Column4
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 3.0 6.2 NaN 11.2
5 3.2 6.8 NaN 10.6
6 3.6 7.2 NaN 10.0
Pandas mean has a skipna
flag to be told to ignore the NaNs see Pandas 意味着有一个skipna
标志被告知忽略NaN见
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html
Try尝试
df2 = df1.rolling(5).mean(skipna=True)
or或者
df2 = df1.rolling(5).apply(pd.np.nanmean)
You should interpolate the NaN with either 0 or mean.您应该使用 0 或均值对 NaN 进行插值。
Below works.下面的作品。
df1 = df1.fillna(df1.mean()) df1 = df1.fillna(df1.mean())
df2 = df1.rolling(5).mean() df2 = df1.rolling(5).mean()
You can use:您可以使用:
df2 = df1[df1['Column3'].notna()].rolling(5).mean()
here you simply form new df without rows with NaN在这里,您只需使用 NaN 形成没有行的新 df
If you don't want to lose the data in good columns如果您不想丢失好列中的数据
df2 = df1.drop("Column3", axis=1).rolling(5).mean()
df2["Colunm3"] = df1['Column3'].notna().rolling(5).mean()
you calculate for all good columns, then for one with NaN你计算所有好的列,然后计算一个 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.