简体   繁体   English

R 滞后/超前不规则时间序列数据

[英]R lag/lead irregular time series data

I have irregular time series data frame with time (seconds) and value columns.我有带有time (秒)和value列的不规则时间序列数据框。 I want to add another column, value_2 where values are lead by delay seconds.我想添加另一列value_2 ,其中值由delay秒引导。 So value_2 at time t equals to value at time t + delay or right after that.所以value_2在时间t等于value在时间t + delay后或右。

ts=data.frame(
  time=c(1,2,3,5,8,10,11,15,20,23),
  value=c(1,2,3,4,5,6,7,8,9,10)
)

ts_with_delayed_value <- add_delayed_value(ts, "value", 2, "time")

> ts_with_delayed_value
   time value value_2
1     1     1       3
2     2     2       4
3     3     3       4
4     5     4       5
5     8     5       6
6    10     6       8
7    11     7       8
8    15     8       9
9    20     9      10
10   23    10      10

I have my own version of this function add_delayed_value , here it is:我有我自己版本的这个函数add_delayed_value ,这里是:

add_delayed_value <- function(data, colname, delay, colname_time) {
  colname_delayed <- paste(colname, sprintf("%d", delay), sep="_")
  data[colname_delayed] <- NaN

  for (i in 1:nrow(data)) {
    time_delayed <- data[i, colname_time] + delay
    value_delayed <- data[data[colname_time] >= time_delayed, colname][1]
    if (is.na(value_delayed)) {
      value_delayed <- data[i, colname]
    }
    data[i, colname_delayed] <- value_delayed
  }

  return(data)
}

Is there a way to vectorize this routine to avoid the slow loop?有没有办法向量化这个例程以避免慢循环?

I'm quite new to R, so this code probably has lots of issues.我对 R 很陌生,所以这段代码可能有很多问题。 What can be improved about it?有什么可以改进的?

You could try: 您可以尝试:

library(dplyr)
library(zoo)
na.locf(ts$value[sapply(ts$time, function(x) min(which(ts$time - x >=2 )))])
[1]  3  4  4  5  6  8  8  9 10 10

What you want is not clear, give a pseudo code or a formula. 您想要的不清楚,给出一个伪代码或公式。 It looks like this is what you want... From what I understand from you the last value should be NA 看来这就是您想要的...据我了解,您的最后一个值应该是NA

library(data.table)
setDT(ts,key='time')
ts_delayed = ts[,.(time_delayed=time+2)]
setkey(ts_delayed,time_delayed)
ts[ts_delayed,roll=-Inf]

This should work for your data. 这应该适合您的数据。 If you want to make a general function, you'll have to play around with lazyeval, which honestly might not be worth it. 如果要执行一般功能,则必须尝试使用​​lazyeval,说实话这可能不值得。

library(dplyr)
library(zoo)

carry_back = . %>% na.locf(na.rm = TRUE, fromLast = FALSE)


data_frame(time = 
             with(ts, 
                  seq(first(time), 
                      last(time) ) ) ) %>%
  left_join(ts) %>%
  transmute(value_2 = carry_back(value),
            time = time - delay) %>%
  right_join(ts) %>%
  mutate(value_2 = 
           value_2 %>%
           is.na %>%
           ifelse(last(value), value_2) )

collapse::flag supports fast lagging of irregular time series and panels, see also my answer here . collapse::flag支持不规则时间序列和面板的快速滞后,另请参阅我的回答here To get your exact result, you would have to fill the missing values introduced by flag with a function such as data.table::nafill with option "locf" .要获得准确的结果,您必须使用诸如data.table::nafill和选项"locf"类的函数来填充flag引入的缺失值。 The combination of these two functions is likely going to be the most parsimonious and efficient solution - compared to what was suggested previously.与之前的建议相比,这两个功能的组合可能是最简洁、最有效的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM