[英]Fill missing row values as an average of previous and next row in pandas dataframe
My data frame looks like this我的数据框看起来像这样
county_name state year rank county_population city_population
31 Fairfax County Virginia 2010.0 0.0 1086730.0 60300
32 Fairfax County Virginia 2011.0 0.0 1099603.0 60300
33 Fairfax County Virginia 2013.0 0.0 1130364.0 60300
34 Fairfax County Virginia 2014.0 0.0 1138123.0 60300
35 Fairfax County Virginia 2015.0 0.0 1142245.0 60300
I want to insert the missing row for year 2012 and assign it rank 7. For the values of county and city population, I want to take an average of the previous and next rows (2011 and 2013) and fill those values for the missing row.我想插入 2012 年缺失的行并将其指定为 7。对于县和市人口的值,我想取前一行和下一行(2011 年和 2013 年)的平均值,并为缺失的行填充这些值.
Any pointers will be highly appreciated任何指针将不胜感激
EDIT 1: Expected data frame should be编辑 1:预期的数据框应该是
county_name state year rank county_population city_population
31 Fairfax County Virginia 2010.0 0.0 1086730.0 60300
32 Fairfax County Virginia 2011.0 0.0 1099603.0 60300
33 Fairfax County Virginia 2012.0 7.0 1114984.0 60300
34 Fairfax County Virginia 2013.0 0.0 1130364.0 60300
35 Fairfax County Virginia 2014.0 0.0 1138123.0 60300
36 Fairfax County Virginia 2015.0 0.0 1142245.0 60300
Create a new dataframe and merge them, sort by year and interpolate missing values:创建一个新的数据框并合并它们,按年份排序并插入缺失值:
data = [['Fairfax County', 'Virginia', 2012, 7, np.NaN, np.NaN]]
out = df.append(pd.DataFrame(data, columns=df.columns)) \
.sort_values('year').interpolate()
print(out)
Output result:输出结果:
>>> out
county_name state year rank county_population city_population
31 Fairfax County Virginia 2010 0.0 1086730.0 60300.0
32 Fairfax County Virginia 2011 0.0 1099603.0 60300.0
0 Fairfax County Virginia 2012 7.0 1114983.5 60300.0
33 Fairfax County Virginia 2013 0.0 1130364.0 60300.0
34 Fairfax County Virginia 2014 0.0 1138123.0 60300.0
35 Fairfax County Virginia 2015 0.0 1142245.0 60300.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.