如何用 R 插值数据？

Question

I have a data frame that look like this:我有一个看起来像这样的数据框：

z <- data.frame(ent = c(1, 1, 1, 2, 2, 2, 3, 3, 3), year = c(1995, 2000, 2005, 1995, 2000, 2005, 1995, 2000, 2005), pobtot = c(50, 60, 70, 10, 4, 1, 100, 105, 110))

As you can see, there is a gap between 5 years for every "ent".如您所见，每个“ent”之间都有5年的差距。 I want to interpolate data to every missing year: 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004 and also prognosticate to 2006, 2007 and 2008. Is there a way to do this?我想将数据插入到每个缺失的年份：1996、1997、1998、1999、2001、2002、2003、2004 并预测到 2006、2007 和 2008。有没有办法做到这一点？

Any help would be appreciated.任何帮助，将不胜感激。

Answer 1

We can use complete to expand the data for each 'ent' and the 'year' range, then with na.approx interpolate the missing values in 'pobtot'我们可以使用complete来扩展每个 'ent' 和 'year' 范围的数据，然后使用na.approx插入 'pobtot' 中的缺失值

library(dplyr)
library(tidyr)
z %>% 
   complete(ent, year = 1995:2008) %>% 
   mutate(pobtot = zoo::na.approx(pobtot, na.rm = FALSE))

Answer 2

Assuming you want linear interpolation, R uses approx() for such things by default, eg for drawing lines in a plot.假设你想要线性插值，R 默认使用approx()来处理这些事情，例如在 plot 中绘制线条。 We may also use that function to interpolate the years.我们也可以使用 function 来插入年份。 It doesn't extrapolate, though, but we could use forecast::ets() with default settings for this which calculates an exponential smoothing state space model.不过，它不能外推，但我们可以使用带有默认设置的forecast::ets()来计算指数平滑 state 空间 model。 Note, however, that this may also produce negative values, but OP hasn't stated what is needed in such a case.但是请注意，这也可能会产生负值，但 OP 并未说明在这种情况下需要什么。 So anyway in a by() approach we could do:所以无论如何，在by()方法中，我们可以这样做：

library(forecast)
p <- 3  ## define number of years for prediction

res <- do.call(rbind, by(z, z$ent, function(x) {
  yseq <- min(x$year):(max(x$year) + p)  ## sequence of years + piction
  a <- approx(x$year, x$pobtot, head(yseq, -p))$y  ## linear interpolation
  f <- predict(ets(a), 3) ## predict `p` years
  r <- c(a, f$mean)  ## combine interpolation and prediction
  data.frame(ent=x$ent[1], year=yseq, pobtot=r)  ## output as data frame
}))

Result结果

res
#      ent year pobtot
# 1.1    1 1995   50.0
# 1.2    1 1996   52.0
# 1.3    1 1997   54.0
# 1.4    1 1998   56.0
# 1.5    1 1999   58.0
# 1.6    1 2000   60.0
# 1.7    1 2001   62.0
# 1.8    1 2002   64.0
# 1.9    1 2003   66.0
# 1.10   1 2004   68.0
# 1.11   1 2005   70.0
# 1.12   1 2006   72.0
# 1.13   1 2007   74.0
# 1.14   1 2008   76.0
# 2.1    2 1995   10.0
# 2.2    2 1996    8.8
# 2.3    2 1997    7.6
# 2.4    2 1998    6.4
# 2.5    2 1999    5.2
# 2.6    2 2000    4.0
# 2.7    2 2001    3.4
# 2.8    2 2002    2.8
# 2.9    2 2003    2.2
# 2.10   2 2004    1.6
# 2.11   2 2005    1.0
# 2.12   2 2006    0.4
# 2.13   2 2007   -0.2
# 2.14   2 2008   -0.8
# 3.1    3 1995  100.0
# 3.2    3 1996  101.0
# 3.3    3 1997  102.0
# 3.4    3 1998  103.0
# 3.5    3 1999  104.0
# 3.6    3 2000  105.0
# 3.7    3 2001  106.0
# 3.8    3 2002  107.0
# 3.9    3 2003  108.0
# 3.10   3 2004  109.0
# 3.11   3 2005  110.0
# 3.12   3 2006  111.0
# 3.13   3 2007  112.0
# 3.14   3 2008  113.0

We could quickly check this in a plot, which, apart from the negative values of entity 2 looks quite reasonable.我们可以在 plot 中快速检查这一点，除了实体 2 的负值之外，它看起来非常合理。

with(res, plot(year, pobtot, type='n', main='z'))
with(res[res$year < 2006, ], points(year, pobtot, pch=20, col=3))
with(res[res$year > 2005, ], points(year, pobtot, pch=20, col=4))
with(res[res$year %in% z$year, ], points(year, pobtot, pch=20, col=1))
abline(h=0, lty=3)
legend(2005.25, 50, c('measurem.', 'interpol.', 'extrapol.'), pch=20,
       col=c(1, 3, 4), cex=.8, bty='n')

如何用 R 插值数据？

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-05-16 03:44:53

解决方案2
2 2021-05-16 06:50:02

Result结果

如何用 R 插值数据？

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-05-16 03:44:53

解决方案2 2 2021-05-16 06:50:02

Result结果

解决方案1
2 已采纳 2021-05-16 03:44:53

解决方案2
2 2021-05-16 06:50:02