简体   繁体   English

dplyr:在增加data_frame的同时添加多个滞后

[英]dplyr: Add multiple lags while growing the data_frame

I want to add multiple lags of the data, but I want the entire data to be lagged, and not get cutoff by the existing height of the data_frame . 我想添加多个数据滞后,但是我希望整个数据都被滞后,而不是被data_frame的现有高度data_frame

Here is some basic code to create multiple lags (HT: https://gist.github.com/drsimonj/2038ff9f9c67063f384f10fac95de566 ): 这是一些创建多个滞后的基本代码(HT: https : //gist.github.com/drsimonj/2038ff9f9c67063f384f10fac95de566 ):

# create a basic data_frame
df_foo = data_frame(
  x = 1:12,
  y = runif(12)
)

# create functions to generate multiple lags
lags = 1:3
lag_names = paste0("(Lag ", lags, ")")
lag_functions = setNames(paste("dplyr::lag(., ", lags, ")"), lag_names)

# generate multiple lags
df_foo_lag = df_foo %>% 
  mutate_at(
    vars("x", "y"),
    funs_(lag_functions)
  ) 

This gives: 这给出:

> df_foo_lag
# A tibble: 12 x 8
       x      y `x_(Lag 1)` `y_(Lag 1)` `x_(Lag 2)` `y_(Lag 2)` `x_(Lag 3)` `y_(Lag 3)`
   <int>  <dbl>       <int>       <dbl>       <int>       <dbl>       <int>       <dbl>
 1     1 0.847           NA      NA              NA      NA              NA      NA    
 2     2 0.966            1       0.847          NA      NA              NA      NA    
 3     3 0.231            2       0.966           1       0.847          NA      NA    
 4     4 0.324            3       0.231           2       0.966           1       0.847
 5     5 0.350            4       0.324           3       0.231           2       0.966
 6     6 0.750            5       0.350           4       0.324           3       0.231
 7     7 0.415            6       0.750           5       0.350           4       0.324
 8     8 0.377            7       0.415           6       0.750           5       0.350
 9     9 0.474            8       0.377           7       0.415           6       0.750
10    10 0.108            9       0.474           8       0.377           7       0.415
11    11 0.398           10       0.108           9       0.474           8       0.377
12    12 0.0450          11       0.398          10       0.108           9       0.474

But this is not what I want. 但这不是我想要的。 I want rows to get added to the bottom of the data_frame so that the entire lagged series is added: 我希望将行添加到data_frame的底部,以便添加整个滞后系列:

# what is required
df_foo_lag %>% 
  add_row(
    x = NA,
    y = NA, 
    `x_(Lag 1)` = 12,
    `y_(Lag 1)` = 0.768,
    `x_(Lag 2)` = 11,
    `y_(Lag 2)` =  0.307,
    `x_(Lag 3)` = 10,
    `y_(Lag 3)` = 0.299
  ) %>% 
  add_row(
    x = NA,
    y = NA, 
    `x_(Lag 1)` = NA,
    `y_(Lag 1)` = NA,
    `x_(Lag 2)` = 12,
    `y_(Lag 2)` =  0.768,
    `x_(Lag 3)` = 11,
    `y_(Lag 3)` = 0.307
  ) %>% 
  add_row(
    x = NA,
    y = NA, 
    `x_(Lag 1)` = NA,
    `y_(Lag 1)` = NA,
    `x_(Lag 2)` = NA,
    `y_(Lag 2)` =  NA,
    `x_(Lag 3)` = 12,
    `y_(Lag 3)` = 0.768
  )

Which gives what I want: 这给了我想要的东西:

# A tibble: 15 x 8
       x       y `x_(Lag 1)` `y_(Lag 1)` `x_(Lag 2)` `y_(Lag 2)` `x_(Lag 3)` `y_(Lag 3)`
   <int>   <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
 1     1  0.847           NA      NA              NA      NA              NA      NA    
 2     2  0.966            1       0.847          NA      NA              NA      NA    
 3     3  0.231            2       0.966           1       0.847          NA      NA    
 4     4  0.324            3       0.231           2       0.966           1       0.847
 5     5  0.350            4       0.324           3       0.231           2       0.966
 6     6  0.750            5       0.350           4       0.324           3       0.231
 7     7  0.415            6       0.750           5       0.350           4       0.324
 8     8  0.377            7       0.415           6       0.750           5       0.350
 9     9  0.474            8       0.377           7       0.415           6       0.750
10    10  0.108            9       0.474           8       0.377           7       0.415
11    11  0.398           10       0.108           9       0.474           8       0.377
12    12  0.0450          11       0.398          10       0.108           9       0.474
13    NA NA               12       0.768          11       0.307          10       0.299
14    NA NA               NA      NA              12       0.768          11       0.307
15    NA NA               NA      NA              NA      NA              12       0.768

What is a programmatic way to achieve this? 有什么编程方式可以实现这一目标?

Thanks. 谢谢。

An option would be 一个选择是

library(tidyverse)
library(readr)
l1 <- map(c(0, lags), ~ df_foo %>% 
            summarise_all(list(~ list(c(rep(NA_real_, .x), .)))) %>% 
                 unnest)
res <-  do.call(cbind.fill, c(l1, fill = NA))
names(res)[-(1:2)] <- paste(names(df_foo), 
        rep(lag_names, each = ncol(df_foo)), sep="_")

You can simply add lines before computing your lags: 您可以在计算滞后之前简单地添加行:

# generate multiple lags
df_foo_lag = df_foo %>% 
  bind_rows(tibble(.rows = max(lags))) %>% 
  mutate_at(
    vars("x", "y"),
    funs_(lag_functions)
  ) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM