如何预测 R 中多家公司的时间序列？

Question

I have a dataframe that spans across 5 years, with ~500 companies and several fundamental stats (eg sales, # employees, ROA).我有一个跨越 5 年的数据框，有大约 500 家公司和几个基本统计数据（例如销售额、员工数量、ROA）。 Here's an example of how this could look like.这是一个示例，说明它的外观。 Note, all numbers are just completely randomly picked, apart form the Year, obviously.请注意，显然，除了年份之外，所有数字都是完全随机选择的。

Name姓名	Year年	Sales销售量	Size尺寸	ROA资产回报率
Firm A A公司	2020 2020年	857 857	12000 12000	0.45 0.45
Firm B B公司	2020 2020年	112 112	3500 3500	0.32 0.32
Firm C公司 C	2020 2020年	666 666	7000 7000	0.44 0.44
Firm A A公司	2019 2019年	860 860	12000 12000	0.47 0.47
Firm B B公司	2019 2019年	150 150	3000 3000	0.31 0.31
Firm C公司 C	2019 2019年	700 700	6000 6000	0.44 0.44
... ...	... ...	... ...	... ...	... ...
Firm A A公司	2015 2015年	560 560	10000 10000	0.47 0.47
Firm B B公司	2015 2015年	100 100	2000 2000年	0.31 0.31
Firm C公司 C	2015 2015年	300 300	4000 4000	0.44 0.44

How would you suggest I try to forecast the 2021 ROA for each firm , taking the span of 5 years (2015 - 2020) into consideration?考虑到 5 年（2015 年至 2020 年）的跨度，您建议我如何尝试预测每家公司 2021 年的 ROA ？ I tried toying around with the forecast package.我试着玩弄forecast包。 However, I haven't found a way to do a bulk action for all firms.但是，我还没有找到对所有公司进行批量操作的方法。 My hope would be to end up with something like this:我希望最终会得到这样的结果：

Name姓名	Year年	predicted ROA预测ROA
Firm A A公司	2021 2021年	0.50 0.50
Firm B B公司	2021 2021年	0.35 0.35
Firm C公司 C	2021 2021年	0.43 0.43

I'd be super grateful for any leads!我会非常感谢任何线索！

Answer 1

I like to use mgcv::gam for forecasting.我喜欢使用 mgcv::gam 进行预测。
I used the simplest possible model where ROA only depends on the Name and a smooth function of the Year.我使用了最简单的模型，其中 ROA 仅取决于 Name 和 Year 的平滑函数。
You'll want to increase k, depending on how much data you have (default is 10).您需要增加 k，具体取决于您拥有的数据量（默认值为 10）。
The by variable is used to split the model by Name. by变量用于按名称拆分模型。

df <- structure(list(Name = c("Firm A", "Firm B", "Firm C", "Firm A", 
                        "Firm B", "Firm C", "Firm A", "Firm B", "Firm C"), 
                     Year = c(2020L, 2020L, 2020L, 2019L, 2019L, 2019L, 2015L, 2015L, 2015L), 
                     Sales = c(857L, 112L, 666L, 860L, 150L, 700L, 560L, 100L, 300L), 
                     Size = c(12000L, 3500L, 7000L, 12000L, 3000L, 6000L, 10000L, 2000L, 4000L), 
                     ROA = c(0.45, 0.32, 0.44, 0.47, 0.31, 0.44, 0.47, 0.31, 0.44)), 
                row.names = c(NA, -9L), class = "data.frame")
gamfit <- mgcv::gam(formula = ROA ~ Name + s(Year, k = 3, by = as.factor(Name)), data = df)
summary(gamfit)
predict_df <- data.frame(Name = sort(unique(df$Name)), 
                         Year = 2021L)
predict_df$ROA <- predict(gamfit, newdata = predict_df)

predict_df
    Name Year       ROA
1 Firm A 2021 0.3969841
2 Firm B 2021 0.4098413
3 Firm C 2021 0.4055556

Answer 2

The fable package was designed for this sort of thing.寓言包就是为这种事情设计的。 Here is an artificial example that mimics the data structure in the question.这是一个模拟问题中数据结构的人工示例。

library(tidyverse)
library(fable)
# Synthetic data
df <- tibble(
  Name = rep(paste("Firm",c("A","B","C")),6),
  Year = rep(2015:2020, rep(3,6)),
  ROA = runif(18)
)
# Turn it into a tsibble object
df_ts <- df %>%
  as_tsibble(index=Year, key=Name)
# Forecast each firm
fc <- df_ts %>%
  model(ARIMA(ROA)) %>%
  forecast(h=1)
fc
#> # A fable: 3 x 5 [1Y]
#> # Key:     Name, .model [3]
#>   Name   .model      Year           ROA .mean
#>   <chr>  <chr>      <dbl>        <dist> <dbl>
#> 1 Firm A ARIMA(ROA)  2021 N(0.52, 0.14) 0.517
#> 2 Firm B ARIMA(ROA)  2021 N(0.59, 0.07) 0.587
#> 3 Firm C ARIMA(ROA)  2021 N(0.52, 0.11) 0.522

^{Created on 2021-10-26 by the reprex package (v2.0.1)}^{由reprex 包(v2.0.1) 于 2021 年 10 月 26 日创建}

Here I have used an ARIMA model, but many other models could be used instead.这里我使用了 ARIMA 模型，但可以使用许多其他模型。 See my textbook at https://OTexts.com/fpp3 for many examples using fable with ARIMA and other models.有关使用带有 ARIMA 和其他模型的寓言的许多示例，请参阅我在https://OTexts.com/fpp3 上的教科书。

Answer 3

Actually there are tons of possibilities how to do this.实际上有很多可能性如何做到这一点。 The following solution of mine might be slightly overkill and not the ideal way to predict your problem, but is a mere representation of a scalable model workflow for time series prediction.我的以下解决方案可能有点矫枉过正，不是预测问题的理想方法，而只是时间序列预测的可扩展模型工作流程的表示。

Check out the code below, if it gives you some interesting results and let me know.查看下面的代码，如果它给你一些有趣的结果，请告诉我。 Once you got used to the tidymodels stack and the modeltime framework, this kind of data will become easy to process.一旦你习惯了 tidymodels 堆栈和 modeltime 框架，这种数据就会变得容易处理。

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(modeltime.ensemble))


#### DATA

h = 1

data <- data.frame(
  id = rep(paste("Firm",c("A","B","C")),6),
  date = rep(2015:2020, rep(3,6)),
  value = runif(18)
)

data <- data %>% 
  pivot_wider(names_from = id, values_from = value)

data <- reshape2::melt(data, id.var='date')
dates <- ymd("2015-01-01")+ years(0:5)
dates <- rep(dates,3)
data$date <- dates
names(data)[2] = "id"

data <- data %>%
  group_by(id) %>%
  future_frame(
    .length_out = h,
    .bind_data  = TRUE) %>%
  ungroup() %>% 
  as_tibble() 

# training- and test set
data_splits <- time_series_split(data, assess = "1 year", cumulative = TRUE)



#### PREDICT

model_fit_glmnet <- linear_reg(penalty = 1) %>%
  set_engine("glmnet") %>%
  fit(value ~ ., data = training(data_splits))

model_fit_xgboost <- boost_tree("regression",  learn_rate = 0.35) %>%
  set_engine("xgboost") %>%
  fit(value ~ ., data = training(data_splits))

ensemble <- modeltime_table(
  model_fit_glmnet,
  model_fit_xgboost
) %>%
  ensemble_weighted(loadings = c(4, 6)) 

model_tbl <- modeltime_table(ensemble)

forecast <-
  model_tbl %>%
  modeltime_forecast(
    new_data    = testing(data_splits),
    actual_data = data,
    keep_data = T
  ) %>%
  group_by(id)  


# change layout
forecast <- forecast %>% filter(str_detect(.key,  "prediction")
)
forecast <- forecast[,c(4,5,6)]

如何预测 R 中多家公司的时间序列？

问题描述

3 个解决方案

解决方案1
0 2021-10-23 23:32:23

解决方案2
0 2021-10-25 23:25:17

解决方案3
0 2021-11-03 17:02:50

如何预测 R 中多家公司的时间序列？

问题描述

3 个解决方案

解决方案1 0 2021-10-23 23:32:23

解决方案2 0 2021-10-25 23:25:17

解决方案3 0 2021-11-03 17:02:50

解决方案1
0 2021-10-23 23:32:23

解决方案2
0 2021-10-25 23:25:17

解决方案3
0 2021-11-03 17:02:50