简体   繁体   English

在 R 中制作时间序列数据框询问 3(tidyr)

[英]make time series data frame in R ask 3(tidyr)

https://dplyr.tidyverse.org/reference/lead-lag.html https://dplyr.tidyverse.org/reference/lead-lag.html

I want lag columns.我想要滞后列。 Like above URL.像上面的URL。 But I have many features.但我有很多特点。

data <- data.frame(day=c("2010-01-01","2010-01-02","2010-01-03","2010-01-04","2010-01-05"),
           dummy_1=rbinom(5,1,0.5),
           dummy_2=rbinom(5,1,0.5),
           dummy_3=rbinom(5,1,0.5),
           #and so on ...... many dummy_X colmuns...
           one_hot_1=rbinom(5,1,0.5),
           one_hot_2=rbinom(5,1,0.5),
           one_hot_3=rbinom(5,1,0.5)
           #and so on ...... many one_hot_X colmuns...
           )


         day dummy_1 dummy_2 dummy_3 one_hot_1 one_hot_2 one_hot_3
1 2010-01-01       1       1       1         0         1         1
2 2010-01-02       0       1       1         0         0         0
3 2010-01-03       1       0       1         0         0         0
4 2010-01-04       0       0       1         1         1         1
5 2010-01-05       0       1       0         0         0         1


and I want to get more easily(tidy) and steady colnames.我想更容易(整洁)和稳定的colnames。


data_2 <- mutate(data, 
                 dummy_1_shift_2 = lag(dummy_1, 2),
                 dummy_1_shift_3 = lag(dummy_1, 3),
                 dummy_1_shift_4 = lag(dummy_1, 4),
                 dummy_1_shift_5 = lag(dummy_1, 5),
                 dummy_1_shift_6 = lag(dummy_1, 6),
                 dummy_1_shift_7 = lag(dummy_1, 7),
                 dummy_1_shift_8 = lag(dummy_1, 8),
                 #and so on ...... many dummy_X_shift_Y colmuns...
                 one_hot_shift_2 = lag(one_hot_1, 2),
                 one_hot_shift_3 = lag(one_hot_1, 3),
                 one_hot_shift_4 = lag(one_hot_1, 4),
                 one_hot_shift_5 = lag(one_hot_1, 5),
                 one_hot_shift_6 = lag(one_hot_1, 6),
                 one_hot_shift_7 = lag(one_hot_1, 7),
                 one_hot_shift_8 = lag(one_hot_1, 8)
                 )

do you have any idea?你有什么主意吗? in R.在 R 中。

thank you.谢谢你。

We can loop through data names using map_dfc to apply lag using transmute_at then binding back the original data using bind_cols我们可以使用bind_cols遍历数据名称以使用map_dfc应用lag ,然后使用transmute_at绑定回原始数据

library(dplyr)
bind_cols(data, 
       purrr:map_dfc(names(data)[-1][1:2], function(y) data %>% 
                    transmute_at(vars(y), list(shift_2=~lag(.,2),
                                               shift_3=~lag(.,3),
                                               shift_4=~lag(.,4))) %>% 
                    rename_all(~paste0(y,"_",.))))

         day dummy_1 dummy_2 dummy_3 one_hot_1 one_hot_2 one_hot_3 dummy_1_shift_2 dummy_1_shift_3 dummy_1_shift_4 dummy_2_shift_2 dummy_2_shift_3 dummy_2_shift_4
1 2010-01-01       0       1       0         0         0         1              NA              NA              NA              NA              NA              NA
2 2010-01-02       1       1       0         0         0         0              NA              NA              NA              NA              NA              NA
3 2010-01-03       0       0       0         0         1         1               0              NA              NA               1              NA              NA
4 2010-01-04       1       1       1         1         0         1               1               0              NA               1               1              NA
5 2010-01-05       1       1       0         1         1         1               0               1               0               0               1               1

We can use:我们可以用:

cbind(data, do.call(cbind, lapply(names(data)[-1], function(x) 
    setNames(do.call(cbind.data.frame, lapply(1:8, function(y) 
       dplyr::lag(data[[x]], y))), paste0(x, "_shift_", 1:8)))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM