[英]pandas filling nans by mean of before and after non-nan values
I would like to fill df
's nan
with an average of adjacent elements.我想用相邻元素的平均值填充df
的nan
。
Consider a dataframe:考虑一个数据框:
df = pd.DataFrame({'val': [1,np.nan, 4, 5, np.nan, 10, 1,2,5, np.nan, np.nan, 9]})
val
0 1.0
1 NaN
2 4.0
3 5.0
4 NaN
5 10.0
6 1.0
7 2.0
8 5.0
9 NaN
10 NaN
11 9.0
My desired output is:我想要的输出是:
val
0 1.0
1 2.5
2 4.0
3 5.0
4 7.5
5 10.0
6 1.0
7 2.0
8 5.0
9 7.0 <<< deadend
10 7.0 <<< deadend
11 9.0
I've looked into other solutions such as Fill cell containing NaN with average of value before and after , but this won't work in case of two or more consecutive np.nan
s.我已经研究了其他解决方案,例如Fill cell np.nan
average of value before and after ,但这在两个或更多连续np.nan
的情况下np.nan
。
Any help is greatly appreciated!任何帮助是极大的赞赏!
Use ffill
+ bfill
and divide by 2:使用ffill
+ bfill
并除以 2:
df = (df.ffill()+df.bfill())/2
print(df)
val
0 1.0
1 2.5
2 4.0
3 5.0
4 7.5
5 10.0
6 1.0
7 2.0
8 5.0
9 7.0
10 7.0
11 9.0
EDIT : If 1st and last element contains NaN
then use ( Dark
suggestion):编辑:如果第一个和最后一个元素包含NaN
则使用( Dark
建议):
df = pd.DataFrame({'val':[np.nan,1,np.nan, 4, 5, np.nan,
10, 1,2,5, np.nan, np.nan, 9,np.nan,]})
df = (df.ffill()+df.bfill())/2
df = df.bfill().ffill()
print(df)
val
0 1.0
1 1.0
2 2.5
3 4.0
4 5.0
5 7.5
6 10.0
7 1.0
8 2.0
9 5.0
10 7.0
11 7.0
12 9.0
13 9.0
Althogh in case of multiple nan
's in a row it doesn't produce the exact output you specified, other users reaching this page may actually prefer the effect of the method interpolate()
:尽管在连续多个nan
的情况下,它不会产生您指定的确切输出,但到达此页面的其他用户实际上可能更喜欢interpolate()
方法的效果:
df = df.interpolate()
print(df)
val
0 1.0
1 2.5
2 4.0
3 5.0
4 7.5
5 10.0
6 1.0
7 2.0
8 5.0
9 6.3
10 7.7
11 9.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.