简体   繁体   English

在 tidyverts package 中按键创建时间序列交叉验证切片

[英]Create time series cross validation slices by key in the tidyverts package

Is there a way to create time series cross validation sets by key using the tidyverts package?有没有办法使用 tidyverts package 按键创建时间序列交叉验证集? I can't seem to get it right.我似乎无法正确处理。 Below is a reprex of my attempt.以下是我尝试的代表。

The example involves creating time series cross-validation (slices with 1 step ahead) for forecasting.该示例涉及为预测创建时间序列交叉验证(提前 1 步的切片)。 The key variable has 2 distinct values and I will like to have one tsibble containing the time series slices for both keys.键变量有 2 个不同的值,我希望有一个包含两个键的时间序列切片的 tsibble。 When I try to row-bind both tsibbles, I get an error.当我尝试对两个 tsibble 进行行绑定时,出现错误。

library(dplyr)
library(tibble)
library(tsibble)

# helper function
create_cv_slices <- function(data, forecast_horizon) {
  data %>%
    dplyr::slice(1:(nrow(data) - forecast_horizon)) %>%
    tsibble::stretch_tsibble(.init = nrow(data) - 2 * forecast_horizon, .step = 1)
}

# get data
raw_tsbl <- tibble::tribble(
  ~index,      ~key,    ~Revenue,     ~Claims,
  20160101, "series1",  11011836.1, 5386836.696,
  20160201, "series1", 11042641.16, 9967325.715,
  20160301, "series1", 11445687.52, 10947197.89,
  20160401, "series1", 11252943.11, 6980431.415,
  20160101, "series2",    12236155,    12526224,
  20160201, "series2",     8675364,     9812904,
  20160301, "series2",    10081130,     8423497,
  20160401, "series2",    14840111,     8079813
) %>%
  dplyr::mutate(index = tsibble::yearmonth(as.character(index))) %>%
  tsibble::as_tsibble(index = index, key = key)

keys <- unique(raw_tsbl$key)

# split & combine
tbl1 = raw_tsbl %>%
  dplyr::filter(key == keys[1]) %>%
  create_cv_slices(., forecast_horizon = 1) %>%
  tibble::as_tibble()

tbl2 = raw_tsbl %>%
  dplyr::filter(key == keys[2]) %>%
  create_cv_slices(., forecast_horizon = 1) %>%
  tibble::as_tibble()

dplyr::bind_rows(tbl1, tbl2) %>%
  tsibble::as_tsibble(index = index, key = key)
#> Error: A valid tsibble must have distinct rows identified by key and index.
#> Please use `duplicates()` to check the duplicated rows.

Thank you.谢谢你。

It appears that using bind_rows to combine the tsibbles is what doesn't work.似乎使用 bind_rows 来组合 tsibbles 是行不通的。 Using bind_rows and setting validate = FALSE in the as_tsibble function, creates a tsibble alright but it displays the tsibble as a daily series instead of monthly (which is what it should be).as_tsibble function 中使用 bind_rows 并设置validate = FALSE可以创建一个 tsibble,但它会将 tsibble 显示为每日系列而不是每月(应该是这样)。 However, using rbind with the same argument setting, creates the desired tsibble.但是,使用具有相同参数设置的 rbind 会创建所需的 tsibble。

rbind(tbl1, tbl2) %>%
  tsibble::as_tsibble(index = index, key = c(key, .id), validate = F)

Thanks.谢谢。

Rather than splitting the data manually by key, you can compute your slices on groups of the tsibble.您可以在 tsibble 组上计算切片,而不是通过键手动拆分数据。 group_by_key() is a convenience function (with better performance) that is equivalent to group_by(key) . group_by_key()是一个方便的 function (具有更好的性能),相当于group_by(key) The n() function is a group aware dplyr function which gives the number of observations for the current group. n() function 是一个组感知 dplyr function ,它给出了当前组的观察次数。

library(dplyr)
library(tibble)
library(tsibble)

# get data
raw_tsbl <- tibble::tribble(
  ~index,      ~key,    ~Revenue,     ~Claims,
  20160101, "series1",  11011836.1, 5386836.696,
  20160201, "series1", 11042641.16, 9967325.715,
  20160301, "series1", 11445687.52, 10947197.89,
  20160401, "series1", 11252943.11, 6980431.415,
  20160101, "series2",    12236155,    12526224,
  20160201, "series2",     8675364,     9812904,
  20160301, "series2",    10081130,     8423497,
  20160401, "series2",    14840111,     8079813
) %>%
  dplyr::mutate(index = tsibble::yearmonth(as.character(index))) %>%
  tsibble::as_tsibble(index = index, key = key)

forecast_horizon <- 1

raw_tsbl %>% 
  group_by_key() %>% 
  slice(1:(n() - forecast_horizon)) %>% 
  ungroup() %>% 
  stretch_tsibble(.init = 2, .step = 1)
#> # A tsibble: 10 x 5 [1M]
#> # Key:       .id, key [4]
#>       index key       Revenue    Claims   .id
#>       <mth> <chr>       <dbl>     <dbl> <int>
#>  1 2016 Jan series1 11011836.  5386837.     1
#>  2 2016 Feb series1 11042641.  9967326.     1
#>  3 2016 Jan series2 12236155  12526224      1
#>  4 2016 Feb series2  8675364   9812904      1
#>  5 2016 Jan series1 11011836.  5386837.     2
#>  6 2016 Feb series1 11042641.  9967326.     2
#>  7 2016 Mar series1 11445688. 10947198.     2
#>  8 2016 Jan series2 12236155  12526224      2
#>  9 2016 Feb series2  8675364   9812904      2
#> 10 2016 Mar series2 10081130   8423497      2

Created on 2020-05-08 by the reprex package (v0.3.0)代表 package (v0.3.0) 于 2020 年 5 月 8 日创建

A slight difference in this code is that .init is set to 2, rather than nrow(data)-2*forecast_horizon .这段代码的细微差别是.init设置为 2,而不是nrow(data)-2*forecast_horizon For this data it gives the same result, however the number of observations for each key differs it won't.对于此数据,它给出了相同的结果,但是每个键的观察次数不会有所不同。 Once dplyr v1.0.0 is released, it will be easier to use tools like group_map() or bind_rows() to use a split-apply-combine approach necessary to specify different window parameters for each key.一旦 dplyr v1.0.0 发布,使用group_map()bind_rows()类的工具将更容易使用拆分-应用-组合方法为每个键指定不同的 window 参数。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用寓言包对每月时间序列进行交叉验证 - Cross validation of monthly time series using fable package 给定一个具有多个键的 tsibble,tidyverts 是否能够使用每个时间序列的相应 lambda_guerrero 值对每个时间序列进行 box_cox()? - Given a tsibble with more than one key, is tidyverts able to box_cox() each time series using a respective lambda_guerrero value per time series? 想法:带有寓言和交叉验证的时间序列建模 - Thoughts: time series modeling with fable and cross validation 预处理 tsibble 以运行寓言 package 中的时间序列模型 - Preprocessing tsibble to run time series models from fable package tidyverts 中的分层建模/协调问题 - Problems with hierarchical modelling/reconciliation in tidyverts 如何使用寓言 package 在 R 中的分层系列中实现回归器? - How to implement regressors in a Hierarchical Series in R, with the Fable package? R fabletools accuracy() 第一个参数应该是预测对象或时间序列 - R fabletools accuracy() first argument should be a forecast object or a time series 在 R 中使用 Fable 进行时间序列预测; 确定混合 model 模型的最佳组合 - Time series forecasting using Fable in R; determining most optimum combination of models for mixed model 使用寓言系列后期使用 xreg 进行分层预测 - Hierarchical forecast with xreg late in series using fable 无法安装“寓言”package(错误:package“寓言”的编译失败) - not able to install 'fable' package (ERROR: compilation failed for package ‘fable’)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM