简体   繁体   English

r的一列中缺失值的时间序列的插值

[英]Interpolation of time series of missing values in a column in r

I have currently looked at imputeTS and zoo packages but it does not see to work Current data is.. 我目前查看过imputeTS和zoo软件包,但无法正常工作。当前数据是..

group/timeseries(character)
  1   2017-05-17 04:00:00
  1   2017-05-17 04:01:00
  1           NA
  1           NA
  1   2017-05-17 05:00:00
  1   2017-05-17 06:00:00
  2           NA
  2   2017-05-17 04:31:00
  2           NA
  2           NA
  2           NA
  2   2017-05-17 05:31:00

I would like to fill in NA with the interpolation time series so that the time is the mid point of the row before and after. 我想用插值时间序列填充NA,以便时间是该行前后的中间点。 Also, i have to point out that each time series belongs to a group. 另外,我必须指出,每个时间序列都属于一个组。 Meaning the time resets for each group. 这意味着每个组的时间都会重置。

I will provide a picture of the actual data to be more clear 我将提供实际数据的图片,以便更清楚 在此处输入图片说明

Thanks for the help in advance! 我在这里先向您的帮助表示感谢!

na.approx in the zoo package can do this and the grouping can be handled without loops using either tapply in base or as a group operation in data.table. zoo包中的na.approx可以做到这一点,并且可以使用base中的tapply或data.table中的组操作来处理分组而无需循环。

For your data set 为您的数据集

df <- read.table(text=c("
  group   timeseries
  1   '2017-05-17 04:00:00'
  1   '2017-05-17 04:01:00'
  1   NA
  1   NA
  1   '2017-05-17 05:00:00'
  1   '2017-05-17 06:00:00'
  2   NA
  2   '2017-05-17 04:31:00'
  2   NA
  2   NA
  2   NA
  2   '2017-05-17 05:31:00'
"), 
colClasses = c("integer", "POSIXct"),
header = TRUE)

Write function to coerce vector to zoo object, interpolate NAs, extract result 编写函数以将矢量强制转换为动物园对象,对NA进行插值,提取结果

library(zoo)
foo <- function(x) coredata(na.approx(zoo(x), na.rm = FALSE))

Example using tapply in base R to apply foo to each group 在基数R中使用tapply将foo应用于每个组的示例

df2 <- df #make a copy
df2$timeseries <- do.call(c, tapply(df2$timeseries, INDEX = df2$group, foo))

Example using group by in data.table to apply foo to each group 在data.table中使用group by的示例将foo应用于每个组

library(data.table)
DT <- data.table(df)
DT[, timeseries := foo(timeseries), by = "group"]

imputeTS and zoo do not take chars or timestamps as input for their interpolation functions. imputeTS和zoo不将char或timestamp作为其插值函数的输入。 (usually interpolating chars does not make sense) (通常对字符进行插值没有意义)

But you can give characters as input to the na.locf function of zoo. 但是您可以将字符作为输入给zoo的na.locf函数。 (the last observation is carried forward with this function) (最后的观察通过此功能结转)

The best solution for your task should be the following (I am assuming you have the date given as POSIX.ct) 最好的解决方案应该是以下方法(我假设您将日期指定为POSIX.ct)。

# Perform the imputation on numeric input
temp <- imputeTS::na.interpolation( as.numeric ( input ) )

# Transform the numeric values back to dates
as.POSIXct(temp, origin = "1960-01-01", tz = "UTC")

With "input" in the first line being your vector with the POSIX.ct timestamps. 第一行中的“输入”是带有POSIX.ct时间戳的向量。 The origin and tz (timezone) settings in line two have to be set according to your timestamps. 必须根据您的时间戳设置第二行中的原点和tz(时区)设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM