简体   繁体   English

如何用 R 插值数据?

[英]How to interpolate data with R?

I have a data frame that look like this:我有一个看起来像这样的数据框:

z <- data.frame(ent = c(1, 1, 1, 2, 2, 2, 3, 3, 3), year = c(1995, 2000, 2005, 1995, 2000, 2005, 1995, 2000, 2005), pobtot = c(50, 60, 70, 10, 4, 1, 100, 105, 110))

As you can see, there is a gap between 5 years for every "ent".如您所见,每个“ent”之间都有5年的差距。 I want to interpolate data to every missing year: 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004 and also prognosticate to 2006, 2007 and 2008. Is there a way to do this?我想将数据插入到每个缺失的年份:1996、1997、1998、1999、2001、2002、2003、2004 并预测到 2006、2007 和 2008。有没有办法做到这一点?

Any help would be appreciated.任何帮助,将不胜感激。

We can use complete to expand the data for each 'ent' and the 'year' range, then with na.approx interpolate the missing values in 'pobtot'我们可以使用complete来扩展每个 'ent' 和 'year' 范围的数据,然后使用na.approx插入 'pobtot' 中的缺失值

library(dplyr)
library(tidyr)
z %>% 
   complete(ent, year = 1995:2008) %>% 
   mutate(pobtot = zoo::na.approx(pobtot, na.rm = FALSE))

Assuming you want linear interpolation, R uses approx() for such things by default, eg for drawing lines in a plot.假设你想要线性插值,R 默认使用approx()来处理这些事情,例如在 plot 中绘制线条。 We may also use that function to interpolate the years.我们也可以使用 function 来插入年份。 It doesn't extrapolate, though, but we could use forecast::ets() with default settings for this which calculates an exponential smoothing state space model.不过,它不能外推,但我们可以使用带有默认设置的forecast::ets()来计算指数平滑 state 空间 model。 Note, however, that this may also produce negative values, but OP hasn't stated what is needed in such a case.但是请注意,这也可能会产生负值,但 OP 并未说明在这种情况下需要什么。 So anyway in a by() approach we could do:所以无论如何,在by()方法中,我们可以这样做:

library(forecast)
p <- 3  ## define number of years for prediction

res <- do.call(rbind, by(z, z$ent, function(x) {
  yseq <- min(x$year):(max(x$year) + p)  ## sequence of years + piction
  a <- approx(x$year, x$pobtot, head(yseq, -p))$y  ## linear interpolation
  f <- predict(ets(a), 3) ## predict `p` years
  r <- c(a, f$mean)  ## combine interpolation and prediction
  data.frame(ent=x$ent[1], year=yseq, pobtot=r)  ## output as data frame
}))

Result结果

res
#      ent year pobtot
# 1.1    1 1995   50.0
# 1.2    1 1996   52.0
# 1.3    1 1997   54.0
# 1.4    1 1998   56.0
# 1.5    1 1999   58.0
# 1.6    1 2000   60.0
# 1.7    1 2001   62.0
# 1.8    1 2002   64.0
# 1.9    1 2003   66.0
# 1.10   1 2004   68.0
# 1.11   1 2005   70.0
# 1.12   1 2006   72.0
# 1.13   1 2007   74.0
# 1.14   1 2008   76.0
# 2.1    2 1995   10.0
# 2.2    2 1996    8.8
# 2.3    2 1997    7.6
# 2.4    2 1998    6.4
# 2.5    2 1999    5.2
# 2.6    2 2000    4.0
# 2.7    2 2001    3.4
# 2.8    2 2002    2.8
# 2.9    2 2003    2.2
# 2.10   2 2004    1.6
# 2.11   2 2005    1.0
# 2.12   2 2006    0.4
# 2.13   2 2007   -0.2
# 2.14   2 2008   -0.8
# 3.1    3 1995  100.0
# 3.2    3 1996  101.0
# 3.3    3 1997  102.0
# 3.4    3 1998  103.0
# 3.5    3 1999  104.0
# 3.6    3 2000  105.0
# 3.7    3 2001  106.0
# 3.8    3 2002  107.0
# 3.9    3 2003  108.0
# 3.10   3 2004  109.0
# 3.11   3 2005  110.0
# 3.12   3 2006  111.0
# 3.13   3 2007  112.0
# 3.14   3 2008  113.0

We could quickly check this in a plot, which, apart from the negative values of entity 2 looks quite reasonable.我们可以在 plot 中快速检查这一点,除了实体 2 的负值之外,它看起来非常合理。

with(res, plot(year, pobtot, type='n', main='z'))
with(res[res$year < 2006, ], points(year, pobtot, pch=20, col=3))
with(res[res$year > 2005, ], points(year, pobtot, pch=20, col=4))
with(res[res$year %in% z$year, ], points(year, pobtot, pch=20, col=1))
abline(h=0, lty=3)
legend(2005.25, 50, c('measurem.', 'interpol.', 'extrapol.'), pch=20,
       col=c(1, 3, 4), cex=.8, bty='n')

![在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM