简体   繁体   English

如何对时间序列中的缺失值进行插值,受限于连续NA(R)的数量?

[英]How to interpolate missing values in a time series, limited by the number of sequential NAs (R)?

I have missing values in a time series of dates. 我在日期的时间序列中缺少值。 For example: 例如:

set.seed(101)

df <- data.frame(DATE = as.Date(c('2012-01-01', '2012-01-02', 
'2012-01-03', '2012-01-05', '2012-01-06', '2012-01-15', '2012-01-18', 
'2012-01-19', '2012-01-20', '2012-01-22')),
                 VALUE = rnorm(10, mean = 5, sd = 2))

How can I write a function that will fill all the missing dates between the first and last date (ie 2012-01-01 and 2012-01-22'), then use interpolation (linear and smoothing spline) to fill the missing values, but not more than 3 sequential missing values (ie no interpolation between 2012-01-06 and 2012-01-15)? 我该如何编写一个函数来填充第一个日期和最后一个日期之间的所有缺失日期(即2012-01-01和2012-01-22'),然后使用插值(线性和平滑样条线)填充缺失值,但不超过3个连续缺失值(即在2012-01-06和2012-01-15之间没有插值)?

The function will be applied to a very large dataframe. 该功能将应用于非常大的数据框。 I have been able to write a function that uses linear interpolation to fill all missing values between two dates (see below), but I can not figure out how to stop it interpolating long stretches of missing values. 我已经能够编写一个函数,该函数使用线性插值来填充两个日期之间的所有缺失值(请参见下文),但是我无法弄清楚如何停止对大量缺失值进行插值。

interpolate.V <- function(df){

  # sort data by time 
  df <- df[order(df$DATE),]

  # linnearly interpolate VALUE for all missing DATEs
  temp <- with(df, data.frame(approx(DATE, VALUE, xout = seq(DATE[1], 
               DATE[nrow(df)], "day"))))
  colnames(temp) <- c("DATE", "VALUE_INTERPOLATED")
  temp$ST_ID <- df$ST_ID[1]
  out <- merge(df, temp, all = T)
  rm(temp)

  return(out)
}

Any help will be greatly appreciated! 任何帮助将不胜感激!

Thanks 谢谢

Function that adds rows for all missing dates: 为所有缺少的日期添加行的函数:

date.range <- function(sub){

  sub$DATE <- as.Date(sub$DATE)
  DATE <- seq.Date(min(sub$DATE), max(sub$DATE), by="day")
  all.dates <- data.frame(DATE)
  out <- merge(all.dates, sub, all = T)

  return(out)
}

Use na.approx or na.spline from zoo package with maxgap argument: 使用带有maxgap参数的zoo软件包中的na.approx或na.spline:

interpolate.zoo <- function(df){
  df$VALUE_INT <- na.approx(df$VALUE, maxgap = 3, na.rm = F)
  return(df)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM