在具有purrr map2函数的线性模型中使用多个预测变量

Question

My question is similar to this one , but now I am trying to use a model with multiple predictors and I can't figure out how to get the newdata into the predict function. 我的问题与此类似，但是现在我试图使用具有多个预测变量的模型，但无法弄清楚如何将新数据纳入预测函数。

library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
library(broom)

set.seed(1234)

First I create a seq of weeks 首先，我创建几个星期

wks = seq(as.Date("2010-01-01"), Sys.Date(), by="1 week")

Then I grab the current year 然后我抓住了今年

cur_year <- year(Sys.Date())

Here I create a data frame with dummy data 在这里，我用伪数据创建一个数据帧

my_data <- data.frame(
  week_ending = wks
) %>% 
  mutate(
    ref_period = week(week_ending),
    yr = year(week_ending),
    PCT.EXCELLENT = round(runif(length(wks), 0, 100),0),
    PCT.GOOD = round(runif(length(wks), 0, 100),0),
    PCT.FAIR = round(runif(length(wks), 0, 100),0),
    PCT.POOR = round(runif(length(wks), 0, 100),0),
    PCT.VERY.POOR = round(runif(length(wks), 0, 100),0),
    pct_trend = round(runif(length(wks), 75, 125),0)
  )

Next I create a nested dataframe that has the data for each week of the year as one group. 接下来，我创建一个嵌套的数据框，将一年中每个星期的数据作为一组。

cond_model <- my_data %>% 
  filter(yr != cur_year) %>% 
  group_by(ref_period) %>% 
  nest(.key=cond_data)

Here I join this year's data back into the previous years' data by week of the year. 在这里，我将本年度的数据按一年中的一周合并回往年的数据。

cond_model <- left_join(
  cond_model,
  my_data %>% 
    filter(yr==cur_year) %>% 
    select(week_ending,
           ref_period,
           PCT.EXCELLENT,
           PCT.FAIR,
           PCT.GOOD,
           PCT.POOR,
           PCT.VERY.POOR),
  by = c("ref_period")
)

And this adds the linear model to the data frame for each week of the year 并将线性模型添加到一年中每个星期的数据框中

cond_model <- 
  cond_model %>% 
  mutate(model = map(cond_data,
                     ~lm(pct_trend ~ PCT.EXCELLENT + PCT.GOOD + PCT.FAIR + PCT.POOR + PCT.VERY.POOR, data = .x)))

now I would like to use the model for each week to predict using this year's data. 现在我想每周使用该模型来预测使用今年的数据。 I tried the following: 我尝试了以下方法：

cond_model <- 
  cond_model %>% 
  mutate(
    pred_pct_trend = map2_dbl(model, PCT.EXCELLENT + PCT.GOOD + PCT.FAIR + PCT.POOR + PCT.VERY.POOR,
                              ~predict(.x, newdata = data.frame(.y)))
  )

That gives the following error: 这给出了以下错误：

Error in mutate_impl(.data, dots) : object 'PCT.EXCELLENT' not found

I then tried nesting my predictors in my data frame... 然后，我尝试将预测变量嵌套在数据框中...

create a data frame with just this year's data and nest the predictors 用今年的数据创建数据框架并嵌套预测变量

cur_cond <- my_data %>% 
  filter(yr==cur_year) %>% 
  select(week_ending, PCT.EXCELLENT,
         PCT.GOOD, PCT.FAIR, PCT.POOR, PCT.VERY.POOR) %>% 
  group_by(week_ending) %>% 
  nest(.key=new_data) %>% 
  mutate(new_data=map(new_data, ~data.frame(.x)))

join this into my main data frame 将其加入我的主数据框架

cond_model <- left_join(cond_model, cur_cond)

Now I try the prediction again: 现在，我再次尝试预测：

cond_model <- 
  cond_model %>% 
  mutate(
    pred_pct_trend = map2_dbl(model, new_data,
                              ~predict(.x, newdata = data.frame(.y)))
  )

I get the same error as before: 我收到与以前相同的错误：

Error in mutate_impl(.data, dots) : object 'PCT.EXCELLENT' not found

I think that the answer could involve performing a flatten() on the predictors, but I can't figure out where that goes in my workflow. 我认为答案可能涉及对预测变量执行flatten（），但是我无法弄清楚工作流程中的位置。

cond_model$new_data[1]

vs. 与

flatten_df(cond_model$new_data[1])

and at this point I have run out of ideas. 在这一点上，我的想法已经用完了。

Answer 1

Once you get your prediction dataset added in, the main issue is how to deal with the weeks that don't have prediction data (weeks 31-53). 一旦添加了预测数据集，主要问题就是如何处理没有预测数据的星期（第31-53周）。

You'll see when you join the two datasets, the rows without prediction dataset will be filled with NULL . 您将看到将两个数据集合并时，没有预测数据集的行将填充为NULL 。 You can use an ifelse statement to give predictions of NA for these rows. 您可以使用ifelse语句为这些行给出NA预测。

# Modeling data
cond_model = my_data %>%
    filter(yr != cur_year) %>%
    group_by(ref_period) %>%
    nest(.key = cond_data)

# Create prediction data
cur_cond = my_data %>%
    filter(yr == cur_year) %>% 
    group_by(ref_period) %>% 
    nest( .key = new_data )

# Join these together
cond_model = left_join(cond_model, cur_cond)

# Models
cond_model = cond_model %>% 
    mutate(model = map(cond_data,
                       ~lm(pct_trend ~ PCT.EXCELLENT + PCT.GOOD + 
                               PCT.FAIR + PCT.POOR + PCT.VERY.POOR, data = .x) ) )

Put an ifelse in to return NA when there is no prediction data. 如果没有预测数据，则输入ifelse以返回NA 。

# Predictions
cond_model %>% 
    mutate(pred_pct_trend = map2_dbl(model, new_data,
                                     ~ifelse(is.null(.y), NA, 
                                             predict(.x, newdata = .y) ) ) )

# A tibble: 53 x 5
   ref_period        cond_data         new_data    model pred_pct_trend
        <dbl>           <list>           <list>   <list>          <dbl>
 1          1 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>       83.08899
 2          2 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      114.39089
 3          3 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      215.02055
 4          4 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      130.24556
 5          5 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      112.86516
 6          6 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      107.29866
 7          7 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>       52.11526
 8          8 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      106.22482
 9          9 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      128.40858
10         10 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      108.10306

在具有purrr map2函数的线性模型中使用多个预测变量

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-07-28 19:35:59

在具有purrr map2函数的线性模型中使用多个预测变量

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-07-28 19:35:59

解决方案1
2 已采纳 2017-07-28 19:35:59