简体   繁体   English

在具有purrr map2函数的线性模型中使用多个预测变量

[英]Use multiple predictors in linear model with purrr map2 function

My question is similar to this one , but now I am trying to use a model with multiple predictors and I can't figure out how to get the newdata into the predict function. 我的问题与类似,但是现在我试图使用具有多个预测变量的模型,但无法弄清楚如何将新数据纳入预测函数。

library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
library(broom)

set.seed(1234)

First I create a seq of weeks 首先,我创建几个星期

wks = seq(as.Date("2010-01-01"), Sys.Date(), by="1 week")

Then I grab the current year 然后我抓住了今年

cur_year <- year(Sys.Date())

Here I create a data frame with dummy data 在这里,我用伪数据创建一个数据帧

my_data <- data.frame(
  week_ending = wks
) %>% 
  mutate(
    ref_period = week(week_ending),
    yr = year(week_ending),
    PCT.EXCELLENT = round(runif(length(wks), 0, 100),0),
    PCT.GOOD = round(runif(length(wks), 0, 100),0),
    PCT.FAIR = round(runif(length(wks), 0, 100),0),
    PCT.POOR = round(runif(length(wks), 0, 100),0),
    PCT.VERY.POOR = round(runif(length(wks), 0, 100),0),
    pct_trend = round(runif(length(wks), 75, 125),0)
  )

Next I create a nested dataframe that has the data for each week of the year as one group. 接下来,我创建一个嵌套的数据框,将一年中每个星期的数据作为一组。

cond_model <- my_data %>% 
  filter(yr != cur_year) %>% 
  group_by(ref_period) %>% 
  nest(.key=cond_data) 

Here I join this year's data back into the previous years' data by week of the year. 在这里,我将本年度的数据按一年中的一周合并回往年的数据。

cond_model <- left_join(
  cond_model,
  my_data %>% 
    filter(yr==cur_year) %>% 
    select(week_ending,
           ref_period,
           PCT.EXCELLENT,
           PCT.FAIR,
           PCT.GOOD,
           PCT.POOR,
           PCT.VERY.POOR),
  by = c("ref_period")
) 

And this adds the linear model to the data frame for each week of the year 并将线性模型添加到一年中每个星期的数据框中

cond_model <- 
  cond_model %>% 
  mutate(model = map(cond_data,
                     ~lm(pct_trend ~ PCT.EXCELLENT + PCT.GOOD + PCT.FAIR + PCT.POOR + PCT.VERY.POOR, data = .x)))

now I would like to use the model for each week to predict using this year's data. 现在我想每周使用该模型来预测使用今年的数据。 I tried the following: 我尝试了以下方法:

cond_model <- 
  cond_model %>% 
  mutate(
    pred_pct_trend = map2_dbl(model, PCT.EXCELLENT + PCT.GOOD + PCT.FAIR + PCT.POOR + PCT.VERY.POOR,
                              ~predict(.x, newdata = data.frame(.y)))
  )

That gives the following error: 这给出了以下错误:

Error in mutate_impl(.data, dots) : object 'PCT.EXCELLENT' not found

I then tried nesting my predictors in my data frame... 然后,我尝试将预测变量嵌套在数据框中...

create a data frame with just this year's data and nest the predictors 用今年的数据创建数据框架并嵌套预测变量

cur_cond <- my_data %>% 
  filter(yr==cur_year) %>% 
  select(week_ending, PCT.EXCELLENT,
         PCT.GOOD, PCT.FAIR, PCT.POOR, PCT.VERY.POOR) %>% 
  group_by(week_ending) %>% 
  nest(.key=new_data) %>% 
  mutate(new_data=map(new_data, ~data.frame(.x)))

join this into my main data frame 将其加入我的主数据框架

cond_model <- left_join(cond_model, cur_cond)

Now I try the prediction again: 现在,我再次尝试预测:

cond_model <- 
  cond_model %>% 
  mutate(
    pred_pct_trend = map2_dbl(model, new_data,
                              ~predict(.x, newdata = data.frame(.y)))
  )

I get the same error as before: 我收到与以前相同的错误:

Error in mutate_impl(.data, dots) : object 'PCT.EXCELLENT' not found

I think that the answer could involve performing a flatten() on the predictors, but I can't figure out where that goes in my workflow. 我认为答案可能涉及对预测变量执行flatten(),但是我无法弄清楚工作流程中的位置。

cond_model$new_data[1]

vs.

flatten_df(cond_model$new_data[1])

and at this point I have run out of ideas. 在这一点上,我的想法已经用完了。

Once you get your prediction dataset added in, the main issue is how to deal with the weeks that don't have prediction data (weeks 31-53). 一旦添加了预测数据集,主要问题就是如何处理没有预测数据的星期(第31-53周)。

You'll see when you join the two datasets, the rows without prediction dataset will be filled with NULL . 您将看到将两个数据集合并时,没有预测数据集的行将填充为NULL You can use an ifelse statement to give predictions of NA for these rows. 您可以使用ifelse语句为这些行给出NA预测。

# Modeling data
cond_model = my_data %>%
    filter(yr != cur_year) %>%
    group_by(ref_period) %>%
    nest(.key = cond_data)

# Create prediction data
cur_cond = my_data %>%
    filter(yr == cur_year) %>% 
    group_by(ref_period) %>% 
    nest( .key = new_data )

# Join these together
cond_model = left_join(cond_model, cur_cond)

# Models
cond_model = cond_model %>% 
    mutate(model = map(cond_data,
                       ~lm(pct_trend ~ PCT.EXCELLENT + PCT.GOOD + 
                               PCT.FAIR + PCT.POOR + PCT.VERY.POOR, data = .x) ) )

Put an ifelse in to return NA when there is no prediction data. 如果没有预测数据,则输入ifelse以返回NA

# Predictions
cond_model %>% 
    mutate(pred_pct_trend = map2_dbl(model, new_data,
                                     ~ifelse(is.null(.y), NA, 
                                             predict(.x, newdata = .y) ) ) )

# A tibble: 53 x 5
   ref_period        cond_data         new_data    model pred_pct_trend
        <dbl>           <list>           <list>   <list>          <dbl>
 1          1 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>       83.08899
 2          2 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      114.39089
 3          3 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      215.02055
 4          4 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      130.24556
 5          5 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      112.86516
 6          6 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      107.29866
 7          7 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>       52.11526
 8          8 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      106.22482
 9          9 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      128.40858
10         10 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      108.10306

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM