简体   繁体   English

如何忽略 pandas DataFrame 中滚动平均值计算的 NaN 值?

[英]How to ignore NaN values for a rolling mean calculation in pandas DataFrame?

I try to create a DataFrame containing a rolling mean based on a window with length 5. But my data contains one NaN value and therefore I only get NaN values for column 3 with a NaN values.我尝试创建一个DataFrame ,其中包含基于长度为 5 的 window 的滚动平均值。但我的数据包含一个NaN值,因此我只得到第 3 列的NaN值和一个NaN值。 How is it possible to ignore NaN values when using .rolling(5).mean() ?使用.rolling(5).mean()时如何忽略NaN值?

I have this sample data df1 :我有这个样本数据df1

    Column1 Column2 Column3 Column4
0   1       5       -9.0    13
1   1       6       -10.0   15
2   3       7       -5.0    11
3   4       8       NaN     9
4   6       5       -2.0    8
5   2       8       0.0     10
6   3       8       -3.0    12

For convenience:为了方便:

#create DataFrame with NaN
df1 = pd.DataFrame({
                    'Column1':[1, 1, 3, 4, 6, 2, 3], 
                    'Column2':[5, 6, 7, 8, 5, 8, 8], 
                    'Column3':[-9, -10, -5, 'NaN', -2, 0, -3], 
                    'Column4':[13, 15, 11, 9, 8, 10, 12]
                    })
df1 = df1.replace('NaN',np.nan)
df1

When I use to create a rolling mean based on a window of 5, I get for column 3 only NaN values.当我使用基于 5 的 window 创建滚动平均值时,我只得到第 3 列的NaN值。

df2 = df1.rolling(5).mean()


    Column1 Column2 Column3 Column4
0   NaN     NaN     NaN     NaN
1   NaN     NaN     NaN     NaN
2   NaN     NaN     NaN     NaN
3   NaN     NaN     NaN     NaN
4   3.0     6.2     NaN     11.2
5   3.2     6.8     NaN     10.6
6   3.6     7.2     NaN     10.0

Pandas mean has a skipna flag to be told to ignore the NaNs see Pandas 意味着有一个skipna标志被告知忽略NaN见

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html

Try尝试

df2 = df1.rolling(5).mean(skipna=True)

or或者

df2 = df1.rolling(5).apply(pd.np.nanmean)

You should interpolate the NaN with either 0 or mean.您应该使用 0 或均值对 NaN 进行插值。

Below works.下面的作品。

df1 = df1.fillna(df1.mean()) df1 = df1.fillna(df1.mean())

df2 = df1.rolling(5).mean() df2 = df1.rolling(5).mean()

You can use:您可以使用:

df2 = df1[df1['Column3'].notna()].rolling(5).mean()

here you simply form new df without rows with NaN在这里,您只需使用 NaN 形成没有行的新 df

If you don't want to lose the data in good columns如果您不想丢失好列中的数据

df2 = df1.drop("Column3", axis=1).rolling(5).mean()
df2["Colunm3"] = df1['Column3'].notna().rolling(5).mean()

you calculate for all good columns, then for one with NaN你计算所有好的列,然后计算一个 NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM