简体   繁体   中英

Given a tsibble with more than one key, is tidyverts able to box_cox() each time series using a respective lambda_guerrero value per time series?

My question is: if I had a tsibble with more than one key (n_keys > 1), and either one or more key variables (key_vars >= 1), is the tidyverts suite able to perform a box_cox transformation on each time series (one box_cox transformation per time series) using a respective lambda_guerrero value per time series? Below is my (first) attempt at a minimally reproducible example.

For example: I'm wondering if "step 5" is possible using the tidyverts suite without receiving an error. Rather than apply lambda1=0.36 to concessional, general, and aggregated, as seen in "step 4" without error, I'd like to apply 0.25 to concessional, 0.66 to general, and 0.36 to aggregated, if possible.

Thank you!

library(tidyverse) 
library(lubridate)
library(tsibble)
library(tsibbledata)
library(fabletools)
library(fable)
library(feasts)
library(distributional)

step 1: one key, without Transformation:

tsibbledata::PBS %>% summarize(Cost = sum(Cost)) %>% autoplot(Cost)

step 2: one key, with Transformation:

Siimilar to an example in FPP3 Chapter 3.1. For reference: https://otexts.com/fpp3/transformations.html

lambda1 <- tsibbledata::PBS %>% 
summarize(Cost = sum(Cost)) %>%
features(Cost, features = guerrero) %>% 
pull(lambda_guerrero) # [1] 0.3642197

tsibbledata::PBS %>% summarize(Cost = sum(Cost)) %>% autoplot(box_cox(Cost,lambda1))

step 3: three keys, without Transformation:

tsibbledata::PBS %>% aggregate_key(Concession, Cost = sum(Cost)) %>% autoplot(Cost)

step 4: three keys, with one Transformation:

tsibbledata::PBS %>%
aggregate_key(Concession, Cost = sum(Cost)) %>% 
autoplot(box_cox(Cost,lambda1))

step 5: three keys, with three Transformation:

lambda2 <- tsibbledata::PBS %>% 
aggregate_key(Concession, Cost = sum(Cost)) %>%
features(Cost, features = guerrero) %>% 
pull(lambda_guerrero) # [1] 0.2518823 0.6577645 0.3642197
lambda2
A tibble: 3 x 2
Concession   lambda_guerrero
<chr*>                 <dbl>
1 Concessional           0.252
2 General                0.658
3 <aggregated>           0.364

tsibbledata::PBS %>%
  aggregate_key(Concession, Cost = sum(Cost)) %>%
  autoplot(box_cox(Cost,lambda2)) # caused an error

The issue with your last attempt is related to the length of the values inputted into box_cox(Cost, lambda2) . Cost has length 612 (204 observations for 3 series), and lambda2 has length 3. So R will try to replicate the values in lambda2 so that the lengths match (called "recycling").

However, it does this wrong in this case. It matches Cost[1] with lambda2[1] (correct), Cost[2] with lambda2[2] (incorrect), Cost[3] with lambda2[3] (incorrect), Cost[3] with lambda2[1] (correct), etc. The correct recycling of the parameters is Cost[1:204] uses lambda2[1] , Cost[205:408] with lambda2[2] , and Cost[409:612] with lambda2[3] , so we need to ensure this.

This can be done with rep(lambda2, each = 204) , however the best/safest approach is to use a join operation. This ensures that the parameter matches the correct series (and prevents issues with row ordering). The code below shows how this can be done with left_join() , which matches the lambda values to the data based on the Concession column. Note that the plot doesn't look very good as the transformations (and data) produce values on very different scales. To fix this I recommend facetting to produce different y-axis scales for each series (as done below also).

library(fpp3)
lambda2 <- tsibbledata::PBS %>%
  aggregate_key(Concession, Cost = sum(Cost)) %>%
  features(Cost, features = guerrero)

lambda2
#> # A tibble: 3 x 2
#>   Concession   lambda_guerrero
#>   <chr*>                 <dbl>
#> 1 Concessional           0.252
#> 2 General                0.658
#> 3 <aggregated>           0.364

tsibbledata::PBS %>%
  aggregate_key(Concession, Cost = sum(Cost)) %>%
  # Add lambda to the dataset, matching based on the key variable
  left_join(lambda2, by = "Concession") %>% 
  autoplot(box_cox(Cost, lambda_guerrero))

tsibbledata::PBS %>%
  aggregate_key(Concession, Cost = sum(Cost)) %>%
  # Add lambda to the dataset, matching based on the key variable
  left_join(lambda2, by = "Concession") %>% 
  autoplot(box_cox(Cost, lambda_guerrero)) + 
  facet_grid(rows = vars(Concession), scales = "free_y")

Created on 2021-01-09 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM